Chair: Marc A. Zissman, MIT Lincoln Laboratory (USA)
Marc A. Zissman, (USA)
MIT Lincoln Laboratory, (USA)
A language identification technique using multiple single-language phoneme recognizers followed by n-gram language models yielded top performance at the March 1994 NIST language identification evaluation. Since the NIST evaluation, work has been aimed at further improving performance by using the acoustic likelihoods emitted from gender-dependent phoneme recognizers to weight the phonotactic likelihoods output from gender- dependent language models. We have investigated the effect of restricting processing to the most highly discriminating n-grams, and we have also added explicit duration modeling at the phonotactic level. On the OGI Multi-language Telephone Speech Corpus, accuracy on an 11-language, closed-set, language identification task has risen to 89% on 45-s utterances and 79% on 10-s utterances. Two- language classification accuracy is 98% and 95% for the 45-s and 10-s utterances, respectively. Finally, we have started to apply these same techniques to the problem of dialect identification.
Shubha Kadambe, AT&T Bell Laboratories (USA)
James L. Hieronymus, AT&T Bell Laboratories (USA)
A task independent spoken Language Identification (LID) system which uses phonological and lexical models to distinguish languages is described in this paper. We demonstrate that the performance of a LID system which is based only on acoustic models can be improved by incorporating higher level linguistic knowledge in the form of trigram phonemotactics and lexical matching. We also present the performance of our LID system for four languages (English, German, Mandarin and Spanish).
Yonghong Yan, Oregon Graduate Institute of Science & Technology (USA)
Etienne Barnard, Oregon Graduate Institute of Science & Technology (USA)
An approach to Language Identification(LID) based on language-dependent phone recognition is presented in this paper. A variety of features and their combinations extracted by language-dependent recognizers were evaluated based on the same database. Two novel information sources for LID were introduced: (1) forward and backward bigram based language models, and (2) context-dependent duration models. An LID system using Hidden Markov Models and neural network was developed. The system was trained and evaluated using the OGI_TS database. For a six-language task, the system performance(correct rate) for 45-second long utterances and 10-second long utterances reached 91.96% and 81.08% respectively. The experiments demonstrated the importance of detailed modeling and the method by which these information sources are combined.
Kung-Pu Li, ITT/ACD (USA)
Previously, automatic language identification systems provided good results by using syllabic on-set spectral features; they identified languages by finding the nearest match speakers who were closest to the test utterance. In this paper we show that augmenting the training data by adding speakers achieves a better gender balance in the data and reduces the error rate by more than 10%. Adding features like syllabic coda and prosodic features show very different results which can then be merged with the syllabic on-set spectral features to reduce errors an additional 10%. A dimensionality reduction by means of the principal components shows not only a reduction in computation and memory requirements, but also improves language identification performance when the eigenvectors are normalized with different weights. The combination of all these factors yields a significant improvement in performance when compared with the previous baseline system.
Eluned S. Parris, Ensigma Limited (U.K.)
Michael J. Carey, Ensigma Limited (U.K.)
Language identification experiments have been carried out on language pairs taken from seven of the languages in the OGI Multi-language Telephone Speech Corpus. This builds on our previous work but introduces new techniques which are used to exploit the acoustic and phonetic differences between the languages. Subword Hidden Markov Models for the pair of languages are matched to unknown utterances resulting in three measures the acoustic match, the phoneme frequencies and frequency histograms. Each of these measures gives 80 - 90% accuracy in discriminating language pairs. However these multiple knowledge sources are also combined to give improved results. Majority decision, logistic regression and a linear classifier were compared as data fusion techniques. The linear classifier performed the best giving an average accuracy of 87% - 93% on the pairs from the seven languages.