ICASSP99 Language Modeling I

Language Modeling I
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Discriminative Estimation of Interpolation Parameters for Language Model Classifiers Authors: Volker Warnke, Stefan Harbeck, Elmar Nöth, Heinrich Niemann, Michael Levit, Page (NA) Paper number 1233 Abstract: In this paper we present a new approach for estimating the interpolation parameters of language models (LM) which are used as classifiers. With the classical maximum likelihood (ML) estimation theoretically one needs to have a huge amount of data and the fundamental density assumption has to be correct. Usually one of these conditions is violated, so different optimization techniques like maximum mutual information (MMI) and minimum classification error (MCE) can be used instead, where the interpolation parameters are not optimized on its own but in consideration of all models together. In this paper we present how MCE and MMI techniques can be applied to two different kind of interpolation strategies: the linear interpolation, which is the standard interpolation method and the rational interpolation. We compare ML, MCE and MMI on the German part of the VERBMOBIL corpus, where we get a reduction of 3% of classification error when discriminating between 18 dialog act classes. IC991233.PDF (From Author) IC991233.PDF (Rasterized) TOP Combination of Words and Word Categories in Varigram Histories Authors: Reinhard Blasig, Page (NA) Paper number 1520 Abstract: This paper presents a new kind of language models: category/word varigrams. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categories may be employed to describe a given word history. This provides a much greater flexibility than previous combinations of word-based and category-based language models. Experiments on the WSJ0 corpus and the 1994 ARPA evaluation data indicate that the category/word varigram yields a perplexity reduction of up to 10 percent as compared to a word varigram of the same size, and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent. IC991520.PDF (Scanned) TOP Multi-Class Composite N-gram Based on Connection Direction Authors: Hirofumi Yamamoto, Yoshinori Sagisaka, Page (NA) Paper number 1646 Abstract: A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams. IC991646.PDF (From Author) IC991646.PDF (Rasterized) TOP A Class-based Language Model for Large-vocabulary Speech Recognition Extracted from Part-of-Speech Statistics Authors: Christer Samuelsson, Wolfgang Reichl, Page (NA) Paper number 1781 Abstract: A novel approach is presented to class-based language modeling based on part-of-speech statistics. It uses a deterministic word-to-class mapping, which handles words with alternative part-of-speech assignments through the use of ambiguity classes. The predictive power of word-based language models and the generalization capability of class-based language models are combined using both linear interpolation and word-to-class backoff, and both methods are evaluated. Since each word belongs to one precisely ambiguity class, an exact word-to-class backoff model can easily be constructed. Empirical evaluations on large-vocabulary speech-recognition tasks show perplexity improvements and significant reductions in word error-rate. IC991781.PDF (From Author) IC991781.PDF (Rasterized) TOP Improved Topic-Dependent Language Modeling using Information Retrieval Techniques Authors: Milind Mahajan, Doug Beeferman, X.D. Huang, Page (NA) Paper number 2391 Abstract: N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term context information about the topic of discussion. We use information retrieval techniques to generalize the available context information for topic-dependent language modeling. We demonstrate the effectiveness of this technique by performing experiments on the Wall Street Journal text corpus, which is a relatively difficult task for topic-dependent language modeling since the text is relatively homogeneous. The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model. IC992391.PDF (From Author) IC992391.PDF (Rasterized) TOP Smoothing Methods in Maximum Entropy Language Modeling Authors: Sven C Martin, Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany (Germany) Hermann Ney, Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany (Germany) Joerg Zaplo, Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany (Germany) Page (NA) Paper number 1703 Abstract: This paper discusses various aspects of smoothing techniques in maximum entropy language modeling, a topic not sufficiently covered by previous publications. We show (1) that straightforward maximum entropy models with nested features, e.g. tri-, bi-, and unigrams, result in unsmoothed relative frequencies models; (2) that maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's advanced marginal back-off distribution; this explains some of the reported success of maximum entropy models in the past; (3) perplexity results for nested and non-nested features, e.g. trigrams and distance-trigrams, on a 4-million word subset of the Wall Street Journal Corpus, showing that the smoothing method has more effect on the perplexity than the method to combine information. IC991703.PDF (From Author) IC991703.PDF (Rasterized) TOP Efficient Sampling and Feature Selection in Whole Sentence Maximum Entropy Language Models Authors: Stanley F. Chen, Ronald Rosenfeld, Page (NA) Paper number 2189 Abstract: Conditional Maximum Entropy models have been successfully applied to estimating language model probabilities of the form p(w\|h), but are often too demanding computationally. Furthermore, the conditional framework does not lend itself to expressing global sentential phenomena. We have recently introduced a non-conditional Maximum Entropy language model which directly models the probability of an entire sentence or utterance. The model treats each utterance as a "bag of features," where features are arbitrary computable properties of the sentence. Using the model is computationally straightforward since it does not require normalization. Training the model requires efficient sampling of sentences from an exponential distribution. In this paper, we further develop the model and demonstrate its feasibility and power. We compare the efficiency of several sampling techniques, implement smoothing to accommodate rare features, and suggest an efficient algorithm for improving convergence rate. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analyzing competitive models in the Switchboard domain. IC992189.PDF (From Author) IC992189.PDF (Rasterized) TOP A Maximum Entropy Language Model Integrating N-Gram and Topic Dependencies for Conversational Speech Recognition Authors: Sanjeev P Khudanpur, Jun Wu, Page (NA) Paper number 2192 Abstract: A compact language model which incorporates local dependencies in the form of N-grams and long distance dependencies through dynamic topic conditional constraints is presented. These constraints are integrated using the maximum entropy principle. Issues in assigning a topic to a test utterance are investigated. Recognition results on the Switchboard corpus are presented showing that with a very small increase in the number of model parameters, reduction in word error rate and language model perplexity are achieved over trigram models. Some analysis follows, demonstrating that the gains are even larger on content-bearing words. The results are compared with those obtained by interpolating topic-independent and topic-specific N-gram models. The framework presented here extends easily to incorporate other forms of statistical dependencies such as syntactic word-pair relationships or hierarchical topic constraints. IC992192.PDF (From Author) IC992192.PDF (Rasterized) TOP