1:00, SPEECH-P13.1
COMPOSITE BACKGROUND MODELS AND SCORE STANDARDIZATION FOR LANGUAGE IDENTIFICATION SYSTEMS
T. GLEASON, M. ZISSMAN
This paper describes two enhancements to our language identification
system. Composite background (CBG) modeling allows us to identify
target language speech in an environment where labeled background
training data is unavailable or limited. Instead of separate models
for each of the background languages, a single composite background
model is created from all the non-target training speech. Generally,
the CBG system performed about as well as a baseline system containing
a separate model per background language. The average equal error rate
for 12 CBG tests was 13.6% versus 13.4% for the baseline. We have
also developed and tested a standardized confidence scoring function
based on a single-layer perceptron which has proven to be capable of
robust modeling of score distributions.
1:00, SPEECH-P13.2
IMPROVING TRIGRAM LANGUAGE MODELING WITH THE WORLD WIDE WEB
X. ZHU, R. ROSENFELD
We propose a novel method for using the World Wide Web to acquire trigram
estimates for statistical language modeling. We submit an N-gram as a phrase
query to web search engines. The search engines return the number of web pages
containing the phrase, from which the N-gram count is estimated. The N-gram
counts are then used to form web-based trigram probability estimates. We
discuss the properties of such estimates, and methods to interpolate them with
traditional corpus based trigram estimates. We show that the interpolated
models improve speech recognition word error rate significantly over a small test set.
1:00, SPEECH-P13.3
DIALOG-CONTEXT DEPENDENT LANGUAGE MODELING COMBINING N-GRAMS AND STOCHASTIC CONTEXT FREE GRAMMARS
K. HACIOGLU, W. WARD
In this paper, we present our research on dialog dependent language modeling. In accordance with a speech (or sentence) production model in a discourse we split language modeling into two components;namely,
dialog dependent concept modeling and syntactic modeling. The concept model is conditioned on the last question prompted by the dialog system and it is structured using n-grams. The syntactic model which consists of a collection of stochastic context free grammars one for each concept,describes word sequences that may be used to express the concepts. The resulting LM is evaluated by rescoring N-best lists. We report significant perplexity improvement with moderate word error rate drop within the contex of CU Communicator System; a dialog system for making travel plans by accessing information about flights, hotels, and car rentals.
1:00, SPEECH-P13.4
USE OF NON-NEGATIVE MATRIX FACTORIZATION FOR LANGUAGE MODEL ADAPTATION IN LECTURE TRANSCRIPTION TASK
M. NOVAK, R. MAMMONE
Use of Non-negative matrix factorization in Language Model
adaptation is presented. This is an alternative approach
to Latent Semantic Analysis based Language Modeling using
Singular Value Decomposition (SVD). Potential benefits are
discussed. A new method, which does not require an explicit
document segmentation of the training corpus is presented as well. This method resulted in perplexity reduction of 16% on a database
of biology lecture transcriptions.
1:00, SPEECH-P13.5
PORTABILITY OF SYNTACTIC STRUCTURE FOR LANGUAGE MODELING
C. CHELBA
The paper presents a study on the portability of statistical syntactic knowledge in the framework of the structured language model (SLM). We investigate the impact of porting SLM statistics from the Wall Street Journal (WSJ) to the Air Travel Information System (ATIS)
domain. We compare this approach to applying the Microsoft rule-based
parser for the ATIS data and to using a small amount of data manually parsed at UPenn for gathering the intial SLM statistics.
Surprisingly, despite the fact that it performs modestly in perplexity, the model initialized on WSJ parses outperforms the other initialization methods based on in-domain annotated data, achieving a significant 0.4% absolute and 7% relative reduction in word error rate (WER) over a baseline system whose word error rate is 5.8%; the improvement measured relative to the minimum WER achievable on the N-best lists we worked with is 12%.
1:00, SPEECH-P13.6
EFFICIENT CLASS-BASED LANGUAGE MODELLING FOR VERY LARGE VOCABULARIES
E. WHITTAKER, P. WOODLAND
This paper investigates the perplexity and word error rate performance
of two different forms of class model and the respective data-driven
algorithms for obtaining automatic word classifications. The
computational complexity of the algorithm for the `conventional'
two-sided class model is found to be unsuitable for very large
vocabularies (>100k) or large numbers of classes (>2000). A
one-sided class model is therefore investigated and the complexity of
its algorithm is found to be substantially less in such
situations. Perplexity results are reported on both English and
Russian data. For the latter both 65k and 430k vocabularies are
used. Lattice rescoring experiments are also performed on an English
language broadcast news task. These experimental results show that
both models, when interpolated with a word model, perform similarly
well. Moreover, classifications are obtained for the one-sided model
in a fraction of the time required by the two-sided model, especially
for very large vocabularies.
1:00, SPEECH-P13.7
DATA AUGMENTATION AND LANGUAGE MODEL ADAPTATION
D. JANISZEK, R. DE MORI, F. BECHET
A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram Language Model (LM). This method is based on numerical distances in a reduced space obtained by Singular Value Decomposition (SVD). Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a Word Error Rate (WER) reduction of 6.5%. By further interpolating augmented counts with the counts extracted from a very large newspaper corpus, but only for selected histories, a total WER reduction of 11.7% was obtained.
We show that this approach gives better results than a global count interpolation for all histories of the LM.
1:00, SPEECH-P13.8
USING SEMANTIC CLASS INFORMATION FOR RAPID DEVELOPMENT OF LANGUAGE MODELS WITHIN ASR DIALOGUE SYSTEMS
E. FOSLER-LUSSIER, H. KUO
When dialogue system developers tackle a new domain, much effort is
required; the development of different parts of the system usually
proceeds independently. Yet it may be profitable to coordinate
development efforts between different modules. Here, we focus our
efforts on extending small amounts of language model training data by
integrating semantic classes that were created for a natural language
understanding module. By converting finite state parses of a training
corpus into a probabilistic context free grammar and subsequently
generating artificial data from the context free grammar, we can
significantly reduce perplexity and ASR word error for situations with
little training data. Experiments are presented using data from the
ATIS and DARPA Communicator travel corpora.
1:00, SPEECH-P13.9
ON-LINE LEARNING OF LANGUAGE MODELS WITH WORD ERROR PROBABILITY DISTRIBUTIONS
R. GRETTER, G. RICCARDI
We are interested in the problem of learning stochastic language
models on-line (without speech transcriptions) for adaptive speech
recognition and understanding. In this paper we propose an algorithm
to adapt to variations in the language model distributions based on
the speech input only and without its true transcription. The on-line
probability estimate is defined as a function of the prior and word
error distributions. We show the effectiveness of word-lattice based
error probability distributions in terms of Receiver Operating
Characteristics (ROC) curves and word accuracy. We apply the new
estimates P_{adapt}(w) to the task of adapting on-line an initial
large vocabulary trigram language model and show improvement in word
accuracy with respect to the baseline speech recognizer.
1:00, SPEECH-P13.10
CLASSES FOR FAST MAXIMUM ENTROPY TRAINING
J. GOODMAN
Maximum entropy models are considered by many to be one of the most promising avenues of language modeling research. Unfortunately, long training times make maximum entropy research difficult. We present a novel speedup technique: we change the form of the model to use classes. Our speedup works by creating two maximum entropy models, the first of which predicts the class of each word, and the second of which predicts the word itself. This factoring of the model leads to fewer non-zero indicator functions, and faster normalization, achieving speedups of up to a factor of 35 over one of the best previous techniques. It also results in typically slightly lower perplexities. The same trick can be used to speed training of other machine learning techniques, e.g. neural networks, applied to any problem with a large number of outputs, such as language modeling.