Authors:
Jerome R Bellegarda,
Page (NA) Paper number 1748
Abstract:
A multi-span framework was recently proposed to integrate the various
constraints, both local and global, that are present in the language.
In this approach, local constraints are captured via n-gram language
modeling, while global constraints are taken into account through the
use of latent semantic analysis. The performance of the resulting multi-span
language models, as measured by perplexity, has been shown to compare
favorably with the corresponding n-gram performance. This paper reports
on actual speech recognition experiments, and shows that word error
rate is also substantially reduced. On a subset of the Wall Street
Journal speaker-independent, 20,000-word vocabulary, continuous speech
task, the multi-span framework resulted in a reduction in average word
error rate of up to 17%.
Authors:
Allen L Gorin,
Giuseppe Riccardi,
Page (NA) Paper number 2045
Abstract:
We are interested in adaptive spoken dialog system for automated services.
Peoples' spoken language usage varies over time for a fixed task, and
furthermore varies depending on the state of the dialog. We will characterize
and quantify this variation based on a database of 20K user-transactions
with AT&T's experimental " How May I Help You" spoken dialog system.
We then report on a language adaptation algorithm which was used to
train state-dependent ASR language models, experimentally evaluating
their improved performance with respect to word accuracy and perplexity.
Authors:
Demetrio Aiello,
Cristina Delogu,
Renato De Mori,
Andrea Di Carlo,
Marina Nisi,
Silvia Tummeacciu,
Page (NA) Paper number 1526
Abstract:
The paper describes a system, in JAVA, for written and visual scenario
generation used to collect speech corpora in the framework of a Tourism
Information System. Experimental evidence shows that the corpus generated
with visual scenarios has a higher perplexity and a richer vocabulary
than the corpus generated using the same conceptual derivations to
produce textual scenarios. Furthermore, there is evidence that textual
scenarios influence speakers in the choice of the lexicon used to express
the concepts more than visual scenarios.
Authors:
Amparo Varona,
Ines Torres,
Page (NA) Paper number 1907
Abstract:
A syntactic approach of the well-known N-grams models, the K-Testable
Language in the Strict Sense (K-TSS), is used in this work to be integrated
in a Continuous Speech Recognition (CSR) system. The use of smoothed
K-TSS regular grammars allowed to obtain a deterministic Stochastic
Finite State Automaton (SFSA) integrating K k-TSS models into a self-contained
model. An efficient representation of the whole model in a simple array
of and adequate size is proposed. This structure can be easily handled
at decoding time by a simple search function through the array. This
formulation strongly reduced the number of parameters to be managed
and thus the computing complexity of the model. An experimental evaluation
of the proposed SFSA representation was carried out over an Spanish
recognition task. These experiments showed important memory saving
to allocate K-TSS Language models, more important for higher values
of K. They also showed that the decoding time did not meaningfully
increased when K did. The lower word error rates for the Spanish task
tested were achieved for K=4 and 5. As a consequence the ability of
this syntactic approach of the N-grams to be well integrated in a CSR
system, even for high values of K, has been established.
Authors:
Mary P Harper,
Michael T Johnson,
Leah H Jamieson,
Stephen A Hockema,
Christopher M White,
Page (NA) Paper number 2403
Abstract:
In this paper, we describe a prototype spoken language system that
loosely integrates a speech recognition component based on hidden Markov
models with a constraint dependency grammar (CDG) parser using a word
graph to pass sentence candidates between the two modules. This loosely
coupled system was able to improve the sentence selection accuracy
and concept accuracy over the level achieved by the acoustic module
with a stochastic grammar. Timing profiles suggest that a tighter coupling
of the modules could reduce parsing times of the system, as could the
development of better acoustic models and tighter parsing constraints
for conjunctions.
Authors:
Motoyuki Suzuki,
Shozo Makino,
Hirotomo Aso,
Page (NA) Paper number 1925
Abstract:
Statistic language models obtained from a large number of training
samples play an important role in speech recognition. In order to obtain
higher recognition performance, we should introduce long distance correlations
between words. However, traditional statistic language models such
as word n-grams and ergodic HMMs are insufficient for expressing long
distance correlations between words. In this paper, we propose an acquisition
method for a language model based on HMnet taking into consideration
long distance correlations and word location.
Authors:
Frank Wessel,
Andrea Baader,
Page (NA) Paper number 1385
Abstract:
The use of dialogue-state dependent language models in automatic inquiry
systems can improve speech recognition and understanding if a reasonable
prediction of the dialogue-state is feasible. In this paper, the dialogue-state
is defined as the set of parameters which are contained in the system
prompt. For each dialogue-state a separate language model is constructed.
In order to obtain robust language models despite the small amount
of training data we propose to interpolate all of the dialogue-state
dependent language models linearly for each dialogue-state and to train
the large number of resulting interpolation weights with the EM-Algorithm
in combination with Leaving-One-Out. We present experimental results
on a small Dutch corpus which has been recorded in the Netherlands
with a train timetable information system and show that the perplexity
and the word error rate can be reduced significantly.
Authors:
Adam Kalai,
Stanley F. Chen,
Avrim Blum,
Ronald Rosenfeld,
Page (NA) Paper number 2175
Abstract:
Multiple language models are combined for many tasks in language modeling,
such as domain and topic adaptation. In this work, we compare on-line
algorithms from machine learning to existing algorithms for combining
language models. On-line algorithms developed for this problem have
parameters that are updated dynamically to adapt to a data set during
evaluation. On-line analysis provides guarantees that these algorithms
will perform nearly as well as the best model chosen in hindsight from
a large class of models, e.g., the set of all static mixtures. We describe
several on-line algorithms and present results comparing these techniques
with existing language modeling combination methods on the task of
domain adaptation. We demonstrate that, in some situations, on-line
techniques can significantly outperform static mixtures (by over 10%
in terms of perplexity) and are especially effective when the nature
of the test data is unknown or changes over time.
|