Angular Softmax for Short-Duration Text-independent Speaker Verification
Zili Huang, Shuai Wang and Kai Yu
Abstract:
Recently, researchers propose to build deep learning based end-to-end speaker verification (SV) systems and achieve competitive results compared with the standard i-vector approach. In addition to deep learning architectures, optimization metric, such as softmax loss or triplet loss, is important for extracting speaker embeddings which are discriminative and generalizable to unseen speakers. In this paper, angular softmax (A-softmax) loss is introduced to improve speaker embedding quality. It is investigated in two SV frameworks: a CNN based end-to-end SV framework and an i-vector SV framework where deep discriminant analysis is used for channel compensation. Experimental results on a short-duration text-independent speaker verification dataset generated from SRE reveal that A-softmax achieves significant performance improvement compared with other metrics in both frameworks.
Cite as: Huang, Z., Wang, S., Yu, K. (2018) Angular Softmax for Short-Duration Text-independent Speaker Verification. Proc. Interspeech 2018, 3623-3627, DOI: 10.21437/Interspeech.2018-1545.
BiBTeX Entry:
@inproceedings{Huang2018,
author={Zili Huang and Shuai Wang and Kai Yu},
title={Angular Softmax for Short-Duration Text-independent Speaker Verification},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3623--3627},
doi={10.21437/Interspeech.2018-1545},
url={http://dx.doi.org/10.21437/Interspeech.2018-1545} }