Authors IndexSessionsTechnical programAttendees

 

Session: Audio-video Information Retrieval and Digital Archives - Multilingual and Speech-to-Speech
Translation

Title: Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval

Authors: Helen Meng, Wai-Kit Lo, Berlin Chen, Karen Tang

Abstract: We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval, and experiments show that the cogates improved retrieval performace.

a01hm081.ps a01hm081.pdf