Title: ENHANCED MAP ADAPTATION OF N-GRAM LANGUAGE MODELS USING INDIRECT CORRELATION OF DISTANT WORDS
Authors: Takaaki Moriya, Keikichi Hirose, Nobuaki Minematsu, Hui Jiang
Abstract:
A novel and effective method to adapt n-gram language models to a new domain has been developed. We propose a heuristic method of language model adaptation using indirect correlation between words which are distant from each other, in addition to the conventional n-gram correlation, which represents only superficial and direct information of adjacent words. By adding the correlation of distant words, the adapted models come to include more information on co-occurrence of words of a target domain and improve their performance as perplexity reduction. Furthermore, since the new correlation covers indirect one not appearing in surface sentences, the adapted models still work well in domains somewhat different from the target domain. Experiments show that, in comparison with well-known MAP-based adaptation, the proposed method improves the performance of perplexity reduction by approximately 10% in the target domain and also in another domain.
|