LANGUAGE MODELING

Chair: Roni Rosenfeld, Carnegie Mellon University (USA)

Home

Language Model Adaptation via Minimum Discrimination Information

Authors:

P. Srinivasa Rao, IBM T. J. Watson Research Center (USA)
Michael D. Monkowski, IBM T. J. Watson Research Center (USA)
Salim Roukos, IBM T. J. Watson Research Center (USA)

Volume 1, Page 161

Abstract:

Statistical language models improve the performance of speech recognition systems by providing estimates of a priori probabilities of word sequences. The commonly used trigram language models obtain the conditional probability estimate of a word given the previous two words, from a large corpus of text. The text corpus is often a collection of several small diverse segments such as newspaper articles, or conversations on different topics. Knowledge of the current topic could be utilized to adapt the general trigram language models to match that topic closely. For example, an interpolation of the general language model with one built on the topic data could be used. We first discuss the adaptation of general trigram language models to a known topic using the minimum discrimination information (MDI) method. We then present results on the Switchboard corpus which consists of telephone conversations on several topics.

300dpi TIFF Images of pages:

161 162 163 164

Acrobat PDF file of whole paper:

ic950161.pdf

TOP

Clustering Word Category Based on Binomial posteriori Co-occurence Distribution

Authors:

Masafumi Tamoto, NTT Basic Research Labs (JAPAN)
Takeshi Kawabata, NTT Basic Research Labs (JAPAN)

Volume 1, Page 165

Abstract:

This paper describes a word clustering technique for stochastic language modeling and reports experimental evidence for its validity. The Binomial Posteriori Distribution (BPD) distance measurement between words is introduced. It is based on word co-occurrency and reliability. We plan to consider a practical application of this clustering technology by utilizing each cluster as a Markov state in the construction of a word prediction model.

300dpi TIFF Images of pages:

165 166 167 168

Acrobat PDF file of whole paper:

ic950165.pdf

TOP

Language Modeling by Variable Length Sequences: Theoretical Formulation and Evaluation of Multigrams

Authors:

Sabine Deligne, Telecom Paris (FRANCE)
Frederic Bimbot, Telecom Paris (FRANCE)

Volume 1, Page 169

Abstract:

The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative Expectation-Maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.

300dpi TIFF Images of pages:

169 170 171 172

Acrobat PDF file of whole paper:

ic950169.pdf

TOP

An Integrated Grammar/Bigram Language Model Using Path Scores

Authors:

Harvey Lloyd-Thomas, Ensigma Limited
Jerry H. Wright, Ensigma Limited
Gareth J.F. Jones, University of Cambridge (UK)

Volume 1, Page 173

Abstract:

This paper describes a language model in which context-free grammar rules are integrated into an n-gram framework, complementing it instead of attempting to replace it. This releases the grammar from the aim of parsing sentences overall (which is often undesirable as well as unrealistic), enabling it to be employed selectively in modelling phrases that are identifiable within a flow of speech. Algorithms for model training, and for sentence scoring and interpretation are described. All are based on the principle of summing over paths that span the sentence, but implementation is node-based for efficiency. Perplexity results for this system (using a hierarchy of grammars from empty to full-coverage) are compared with those for n-gram models, and the system is used for re-scoring N-best sentence lists for a speaker-independent recogniser.

300dpi TIFF Images of pages:

173 174 175 176

Acrobat PDF file of whole paper:

ic950173.pdf

TOP

Discourse Structure for Multi-Speaker Spontaneous Spoken Dialogs: Incorporating Heuristics into Stochastic RTNs

Authors:

Sheryl R. Young, Carnegie Mellon University (USA)

Volume 1, Page 177

Abstract:

In real spoken language applications, where speakers interact spontaneously, there is much seeming unpredictability that makes recognition difficult. Multi-speaker spontaneous dialog where two speakers interact verbally to cooperatively solve a mutual, shared problem is more varied than human-computer interactions. Spontaneous speech is not well structured, exhibiting mid-utterance corrections and restarts in utterances. Discourse contains digressions, clarifications, corrections and topic changes. But, multi-speaker discourse is even more varied, with initiative effects, speakers interacting, planning and responding. This makes it extremely difficult to develop grammars and language models with adequate coverage and reliable stochastic parameters. Perplexity increases and recognition degrades considerably vis-a-vis human- database dialog. In spite of all this, multi-speaker dialogs are structured and predictable when the discourse is appropriately modelled. We have developed heuristics to model spontaneous speech and multi-speaker dialogs [4,8]. The underlying heuristics have been evaluated and shown to adequately and accurately predict discourse phenomena, as evaluated on a 10,000+ utterance corpus. Generally, the heuristics for computing discourse structure and deriving constraints from it are rule based. We have taken the rules and used them to develop a set of stochastic RTNs that capture both the rules and corpus probabilities. The resulting language model can be used predictively to dynamically generate stochastic utterance predictions or can be incorporated into any recognition/understanding system where a single prior state is maintained.

300dpi TIFF Images of pages:

177 178 179 180

Acrobat PDF file of whole paper:

ic950177.pdf

TOP

Improved Backing-Off for M-Gram Language Modeling

Authors:

Reinhard Kneser, Philips GmbH Research Laboratories
Hermann Ney, RWTH Aachen University of Technology (GERMANY)

Volume 1, Page 181

Abstract:

In stochastic language modeling, backing-off is a widely used method to cope with the sparse data problem. In case of unseen events this method backs off to a less specific distribution. In this paper we propose to use distributions which are especially optimized for the task of backing-off. Two different theoretical derivations lead to distributions which are quite different from the probability distributions that are usually used for backing-off. Experiments show an improvement of about 10% in terms of perplexity and 5% in terms of word error rate.

300dpi TIFF Images of pages:

181 182 183 184

Acrobat PDF file of whole paper:

ic950181.pdf

TOP

QWI: A Method for Improved Smoothing in Language Modelling

Authors:

G. Bordel, Universidad del Pais Vasco
I. Torres, Universidad del Pais Vasco
E. Vidal, Universidad Politecnica de Valencia (SPAIN)

Volume 1, Page 185

Abstract:

N-grams have been extensively and successfully used for Language Modelling in Continuous Speech Recognition tasks. On the other hand, it has been recently shown that K-testable Stochastic Languages (k-TS) are strictly equivalent to N- grams. A major problem to be solved when using a Language Model is the estimation of the probabilities of events not represented in the training corpus, i.e. unseen events. The aim of this work is to improve other well established smoothing procedures by interpolating models with different levels of complexity (Quality Weighted Interpolation - QWI). The effect of QWI was experimentally evaluated over a set of back-off smoothed k-TS language models. These experiments were carried out over several corpora using the test-set perplexity as an evaluation criterion. In all the cases the introduction of QWI resulted in a reduction of the test-set perplexity.

300dpi TIFF Images of pages:

185 186 187 188

Acrobat PDF file of whole paper:

ic950185.pdf

TOP

Using a Stochastic Context-Free Grammar as a Language Model for Speech Recognition

Authors:

Daniel Jurafsky, University of California at Berkeley
Chuck Wooters, Department of Defense
Jonathan Segal, University of California at Berkeley
Andreas Stolcke, SRI International
Eric Fosler, University of California at Berkeley
Gary Tajchman, Voice Processing Corporation
Nelson Morgan, University of California at Berkeley (USA)

Volume 1, Page 189

Abstract:

This paper describes a number of experiments in adding new grammatical knowledge to the Berkeley Restaurant Project (BeRP), our medium-vocabulary (1,300 word), speaker- independent, spontaneous continuous-speech understanding system (Jurafsky et al 1994). We describe an algorithm for using a probabilistic Earley parser and a stochastic context-free grammar (SCFG) to generate word transition probabilities at each frame for a Viterbi decoder. We show that using an SCFG as a language model improves word error rate from 34.6% (bigram) to 29.6% (SCFG), and semantic sentence recognition error from from 39.0% (bigram) to 34.1% (SCFG). In addition, we get a further reduction to 28.8% word error by mixing the bigram and SCFG LMs. We also report on our preliminary results from using discourse-context information in the LM.

300dpi TIFF Images of pages:

189 190 191 192

Acrobat PDF file of whole paper:

ic950189.pdf

TOP

Improved Language Modeling by Unsupervised Acquisition of Structure

Authors:

Klaus Ries, University of Karlsruhe (GERMANY)
Finn Dag Bu, University of Karlsruhe (GERMANY)
Ye-Yi Wang, Carnegie Mellon University (USA)
Alex Waibel, University of Karlsruhe (GERMANY)

Volume 1, Page 193

Abstract:

The perplexity of corpora is typically reduced by more than 30% compared to advanced n-gram models by a new method for the unsupervised acquisition of structural text models. This method is based on new algorithms for the classification of words and phrases from context and on new sequence finding procedures. These procedures are designed to work fast and accurately on small and large corpora. They are iterated to build a structural model of a corpus. The structural model can be applied to recalculate the scores of a speech recognizer and improves the word accuracy. Further applications such as preprocessing for neural networks and (hidden) markov models in language processing, which exploit the structure finding capabilities of this model, are proposed.

300dpi TIFF Images of pages:

193 194 195 196

Acrobat PDF file of whole paper:

ic950193.pdf

TOP

Understanding Referring Expressions in a Person- Machine Spoken Dialogue

Authors:

Claudia Pateras, McGill University (CANADA)
Gregory Dudek, McGill University (CANADA)
Renato De Mori, McGill University (CANADA)

Volume 1, Page 197

Abstract:

In the domain of mobile robotic task execution under dialogue control, a primary goal is to identify the task target which is specified by a natural language description. A number of concepts are expressed in the user spoken language by vague terms like "the big box" and "very close to the door." We use fuzzy logic to map these vague terms onto the quantitative data collected by system sensors. Fuzziness may cause uncertainty in interpretation and, in particular, in understanding references. This uncertainty is abated by collecting additional information through queries to the user and autonomous sensing. Entropy is used to select the queries having the greatest discriminatory power among referent candidates. In addition, we examine the trade-off between querying, sensing and uncertainty. A framework to deal with each of these issues has been developed and will be presented.

300dpi TIFF Images of pages:

197 198 199 200

Acrobat PDF file of whole paper:

ic950197.pdf