SpacerHome

Spacer
Mirror Sites
Spacer
General Information
Spacer
Confernce Schedule
Spacer
Technical Program
Spacer
     Plenary Sessions
Spacer
     Special Sessions
Spacer
     Expert Summaries
Spacer
     Tutorials
Spacer
     Industry Technology Tracks
Spacer
     Technical Sessions
    
By Date
    March 16
    March 17
    March 18
    March 19
    
By Category
    AE     COMM
    DISPS     DSPE
    ESS     IMDSP
    ITT     MMSP
    NNSP     SAM
    SP     SPEC
    SPTM
    
By Author
        A    B    C    D   
        E    F    G    H   
        I    J    K    L   
        M    N    O    P   
        Q    R    S    T   
        U    V    W    X   
        Y    Z   
Spacer
Tutorials
Spacer
Industry Technology Tracks
Spacer
Exhibits
Spacer
Sponsors
Spacer
Registration
Spacer
Coming to Phoenix
Spacer
Call for Papers
Spacer
Author's Kit
Spacer
On-line Review
Spacer
Future Conferences
Spacer
Help

Abstract: Session SP-17

Conference Logo

SP-17.1  

PDF File of Paper Manuscript
Discriminative Estimation of Interpolation Parameters for Language Model Classifiers
Volker Warnke, Stefan Harbeck, Elmar Noeth, Heinrich Niemann, Michael Levit (University Erlangen-Nuremberg)

In this paper we present a new approach for estimating the interpolation parameters of language models (LM) which are used as classifiers. With the classical maximum likelihood (ML) estimation theoretically one needs to have a huge amount of data and the fundamental density assumption has to be correct. Usually one of these conditions is violated, so different optimization techniques like maximum mutual information (MMI) and minimum classification error (MCE) can be used instead, where the interpolation parameters are not optimized on its own but in consideration of all models together. In this paper we present how MCE and MMI techniques can be applied to two different kind of interpolation strategies: the linear interpolation, which is the standard interpolation method and the rational interpolation. We compare ML, MCE and MMI on the German part of the VERBMOBIL corpus, where we get a reduction of 3% of classification error when discriminating between 18 dialog act classes.


SP-17.2  

PDF File of Paper Manuscript
Combination of Words and Word Categories in Varigram Histories
Reinhard Blasig (Philips Research Laboratories)

This paper presents a new kind of language models: category/word varigrams. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categories may be employed to describe a given word history. This provides a much greater flexibility than previous combinations of word-based and category-based language models. Experiments on the WSJ0 corpus and the 1994 ARPA evaluation data indicate that the category/word varigram yields a perplexity reduction of up to 10 percent as compared to a word varigram of the same size, and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent.


SP-17.3  

PDF File of Paper Manuscript
Multi-Class Composite N-gram Based on Connection Direction
Hirofumi Yamamoto, Yoshinori Sagisaka (ATR-ITL)

A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.


SP-17.4  

PDF File of Paper Manuscript
A Class-based Language Model for Large-vocabulary Speech Recognition Extracted from Part-of-Speech Statistics
Christer Samuelsson, Wolfgang Reichl (Lucent Technologies)

A novel approach is presented to class-based language modeling based on part-of-speech statistics. It uses a deterministic word-to-class mapping, which handles words with alternative part-of-speech assignments through the use of ambiguity classes. The predictive power of word-based language models and the generalization capability of class-based language models are combined using both linear interpolation and word-to-class backoff, and both methods are evaluated. Since each word belongs to one precisely ambiguity class, an exact word-to-class backoff model can easily be constructed. Empirical evaluations on large-vocabulary speech-recognition tasks show perplexity improvements and significant reductions in word error-rate.


SP-17.5  

PDF File of Paper Manuscript
Improved Topic-Dependent Language Modeling using Information Retrieval Techniques
Milind Mahajan (Microsoft Research), Doug Beeferman (Carnegie Mellon University), X.D. Huang (Microsoft Research)

N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term context information about the topic of discussion. We use information retrieval techniques to generalize the available context information for topic-dependent language modeling. We demonstrate the effectiveness of this technique by performing experiments on the Wall Street Journal text corpus, which is a relatively difficult task for topic-dependent language modeling since the text is relatively homogeneous. The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model.


SP-17.6  

PDF File of Paper Manuscript
Smoothing Methods in Maximum Entropy Language Modeling
Sven C Martin, Hermann Ney, Joerg Zaplo (Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany)

This paper discusses various aspects of smoothing techniques in maximum entropy language modeling, a topic not sufficiently covered by previous publications. We show (1) that straightforward maximum entropy models with nested features, e.g. tri-, bi-, and unigrams, result in unsmoothed relative frequencies models; (2) that maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's advanced marginal back-off distribution; this explains some of the reported success of maximum entropy models in the past; (3) perplexity results for nested and non-nested features, e.g. trigrams and distance-trigrams, on a 4-million word subset of the Wall Street Journal Corpus, showing that the smoothing method has more effect on the perplexity than the method to combine information.


SP-17.7  

PDF File of Paper Manuscript
Efficient Sampling and Feature Selection in Whole Sentence Maximum Entropy Language Models
Stanley Chen, Ronald Rosenfeld (Carnegie Mellon University)

Conditional Maximum Entropy models have been successfully applied to estimating language model probabilities of the form p(w|h), but are often too demanding computationally. Furthermore, the conditional framework does not lend itself to expressing global sentential phenomena. We have recently introduced a non-conditional Maximum Entropy language model which directly models the probability of an entire sentence or utterance. The model treats each utterance as a "bag of features," where features are arbitrary computable properties of the sentence. Using the model is computationally straightforward since it does not require normalization. Training the model requires efficient sampling of sentences from an exponential distribution. In this paper, we further develop the model and demonstrate its feasibility and power. We compare the efficiency of several sampling techniques, implement smoothing to accommodate rare features, and suggest an efficient algorithm for improving convergence rate. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analyzing competitive models in the Switchboard domain.


SP-17.8  

PDF File of Paper Manuscript
A Maximum Entropy Language Model Integrating N-Gram and Topic Dependencies for Conversational Speech Recognition
Sanjeev P Khudanpur (Center Johns Hopkins University), Jun Wu (Johns Hopkins University)

A compact language model which incorporates local dependencies in the form of N-grams and long distance dependencies through dynamic topic conditional constraints is presented. These constraints are integrated using the maximum entropy principle. Issues in assigning a topic to a test utterance are investigated. Recognition results on the Switchboard corpus are presented showing that with a very small increase in the number of model parameters, reduction in word error rate and language model perplexity are achieved over trigram models. Some analysis follows, demonstrating that the gains are even larger on content-bearing words. The results are compared with those obtained by interpolating topic-independent and topic-specific N-gram models. The framework presented here extends easily to incorporate other forms of statistical dependencies such as syntactic word-pair relationships or hierarchical topic constraints.


SP-16 SP-18 >


Last Update:  February 4, 1999         Ingo Höntsch
Return to Top of Page