Title: N-gram and Decision Tree Based Language Identification for Written Words
Authors: Juha Häkkinen, Jilei Tian
Abstract:
The development of multilingual speech recognition systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic models is becoming increasingly important. When the recognition grammar is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. This paper describes a text-based language identification system that is optimized for short words. Two different approaches are compared. The n-gram method is first reviewed and further enhanced. We also propose a simple language identification method based on decision trees. The methods are first evaluated in a text-based language identification task. Both methods are also tested as preprocessors for a multilingual speech recognition task, where the language of each text item has to be determined, in order to choose the correct text-to-pronunciation mapping.
|