Speaker Embedding Extraction with Phonetic Information
Yi Liu, Liang He, Jia Liu and Michael T. Johnson
Abstract:
Speaker embeddings achieve promising results on many speaker verification tasks. Phonetic information, as an important component of speech, is rarely considered in the extraction of speaker embeddings. In this paper, we introduce phonetic information to the speaker embedding extraction based on the x-vector architecture. Two methods using phonetic vectors and multi-task learning are proposed. On the Fisher dataset, our best system outperforms the original x-vector approach by 20% in EER and by 15%, 15% in minDCF08 and minDCF10, respectively. Experiments conducted on NIST SRE10 further demonstrate the effectiveness of the proposed methods.
Cite as: Liu, Y., He, L., Liu, J., Johnson, M.T. (2018) Speaker Embedding Extraction with Phonetic Information. Proc. Interspeech 2018, 2247-2251, DOI: 10.21437/Interspeech.2018-1226.
BiBTeX Entry:
@inproceedings{Liu2018,
author={Yi Liu and Liang He and Jia Liu and Michael T. Johnson},
title={Speaker Embedding Extraction with Phonetic Information},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={2247--2251},
doi={10.21437/Interspeech.2018-1226},
url={http://dx.doi.org/10.21437/Interspeech.2018-1226} }