START Conference Manager    

Exploring the Impact of Advanced Front-End Processing on NIST Speaker Recognition Microphone Tasks

William Campbell, Doug Sturim, Jonas Borgstrom, Robert Dunn, Alan McCree, Tom Quatieri and Doug Reynolds

 


Abstract

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data---multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancment, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.

Keywords

Text-Independent Speaker Recognition
Robustness in Channels
Features for Speaker Recognition