Authors IndexSessionsTechnical programAttendees

 

Session: ASR Robustness (Feature Extraction,Acoustic Modeling and Adaptation)

Title: MULTIPLE TIME RESOLUTIONS FOR DERIVATIVES OF MEL-FREQUENCY CEPSTRAL COEFFICIENTS

Authors: Georg Stemmer, Christian Hacker, Elmar Nöth, Heinrich Niemann

Abstract: Most speech recognition systems are based on mel-frequency cepstral coefficients and their first- and second-order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. We present an approach to improve the representation of speech dynamics, which is based on the combination of multiple time resolutions. 
The resulting feature vector is transformed to reduce its dimension and the correlation between the features. 
Another possibility, which has also been evaluated, is to use probabilistic PCA (PPCA) for the output distributions of the HMMs. Different configurations of multiple time resolutions are evaluated as well. When compared to the baseline system a significant reduction of the word error rate can been achieved.

a01gs046.ps a01gs046.pdf