Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function
Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
Abstract:
The acoustic quality of exemplar-based voice conversion (VC) degrades whenever the phoneme labels of the selected exemplars do not match the phonetic content of the frame being represented. To address this issue, we propose a Phoneme-Selective Objective Function (PSOF) that promotes a sparse representation of each speech frame with exemplars from a few phoneme classes. Namely, PSOF enforces group sparsity on the representation, where each group corresponds to a phoneme class. The sparse representation for exemplars within a phoneme class tends to activate or suppress simultaneously using the proposed objective function. We conducted two sets of experiments on the ARCTIC corpus to evaluate the proposed method. First, we evaluated the ability of PSOF to reduce phoneme mismatches. Then, we assessed its performance on a VC task and compared it against three baseline methods from previous studies. Results from objective measurements and subjective listening tests show that the proposed method effectively reduces phoneme mismatches and significantly improves VC acoustic quality while retaining the voice identity of the target speaker.
Cite as: Ding, S., Zhao, G., Liberatore, C., Gutierrez-Osuna, R. (2018) Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function. Proc. Interspeech 2018, 476-480, DOI: 10.21437/Interspeech.2018-1272.
BiBTeX Entry:
@inproceedings{Ding2018,
author={Shaojin Ding and Guanlong Zhao and Christopher Liberatore and Ricardo Gutierrez-Osuna},
title={Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={476--480},
doi={10.21437/Interspeech.2018-1272},
url={http://dx.doi.org/10.21437/Interspeech.2018-1272} }