Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-23 |
|
SP-23.1
|
Speech Recognition Experiments Using Multi-Span Statistical Language Models
Jerome R Bellegarda (Apple Computer)
A multi-span framework was recently proposed to integrate the various
constraints, both local and global, that are present in the language.
In this approach, local constraints are captured via n-gram language
modeling, while global constraints are taken into account through the
use of latent semantic analysis. The performance of the resulting
multi-span language models, as measured by perplexity, has been shown
to compare favorably with the corresponding n-gram performance.
This paper reports on actual speech recognition experiments, and shows
that word error rate is also substantially reduced. On a subset of the
Wall Street Journal speaker-independent, 20,000-word vocabulary,
continuous speech task, the multi-span framework resulted in a
reduction in average word error rate of up to 17%.
|
SP-23.2
|
Spoken Language Variation over Time and State in a Natural Spoken Dialog System
Allen L Gorin,
Giuseppe Riccardi (AT&T-Labs Research)
We are interested in adaptive spoken dialog system for
automated services. Peoples' spoken language usage varies
over time for a fixed task, and furthermore varies
depending on the state of the dialog. We will characterize
and quantify this variation based on a database of
20K user-transactions with AT&T's experimental
" How May I Help You" spoken dialog system.
We then report on a language adaptation algorithm
which was used to train state-dependent ASR language models,
experimentally evaluating their improved performance with
respect to word accuracy and perplexity.
|
SP-23.3
|
COMPARATIVE EVALUATION OF SPOKEN CORPORA ACQUIRED BY PRESENTATION OF VISUAL SCENARIOS AND TEXTUAL DESCRIPTIONS
Demetrio Aiello,
Cristina Delogu (Fondazione Ugo Bordoni),
Renato De Mori (Université d'Avignon et des Pays de Vaucluse),
Andrea Di Carlo,
Marina Nisi,
Silvia Tummeacciu (Fondazione Ugo Bordoni)
The paper describes a system, in JAVA, for written and
visual scenario generation used to collect speech
corpora in the framework of a Tourism Information
System.
Experimental evidence shows that the corpus generated
with visual scenarios has a higher perplexity and a
richer vocabulary than the corpus generated using the
same conceptual derivations to produce textual
scenarios. Furthermore, there is evidence that textual
scenarios influence speakers in the choice of the
lexicon used to express the concepts more than visual
scenarios.
|
SP-23.4
|
Using smoothed K-TSS language models in Continuous Speech Recognition systems
Amparo Varona,
Ines Torres (Departamento de Electricidad y Electronica. Universidad del Pais Vasco.)
A syntactic approach of the well-known N-grams models, the K-Testable Language in the Strict Sense (K-TSS), is used in this work to be integrated in a Continuous Speech Recognition (CSR) system. The use of smoothed K-TSS regular grammars allowed to obtain a deterministic Stochastic Finite State Automaton (SFSA) integrating K k-TSS models into a self-contained model. An efficient representation of the whole model in a simple array of and adequate size is proposed. This structure can be easily handled at decoding time by a simple search function through the array. This formulation strongly reduced the number of parameters to be managed and thus the computing complexity of the model. An experimental evaluation of the proposed SFSA representation was carried out over an Spanish recognition task. These experiments showed important memory saving to allocate K-TSS Language models, more important for higher values of K. They also showed that the decoding time did not meaningfully increased when K did. The lower word error rates for the Spanish task tested were achieved for K=4 and 5. As a consequence the ability of this syntactic approach of the N-grams to be well integrated in a CSR system, even for high values of K, has been established.
|
SP-23.5
|
Interfacing a CDG Parser with an HMM Word Recognizer Using Word Graphs
Mary P Harper,
Michael T Johnson,
Leah H Jamieson,
Stephen A Hockema,
Christopher M White (Purdue University, School of Electrical and Computer Engineering)
In this paper, we describe a prototype spoken language system that
loosely integrates a speech recognition component based on hidden
Markov models with a constraint dependency grammar (CDG) parser using
a word graph to pass sentence candidates between the two modules.
This loosely coupled system was able to improve the sentence selection
accuracy and concept accuracy over the level achieved by the acoustic
module with a stochastic grammar. Timing profiles suggest that a
tighter coupling of the modules could reduce parsing times of the
system, as could the development of better acoustic models and tighter
parsing constraints for conjunctions.
|
SP-23.6
|
An automatic acquisition method of statistic finite-state automaton for sentences
Motoyuki SUZUKI,
Shozo MAKINO (Computer Center / Graduate School of Information Sciences, TOHOKU University),
Hirotomo ASO (Graduate School of Enginieering, TOHOKU University)
Statistic language models obtained from a large number of training samples
play an important role in speech recognition. In order to obtain higher
recognition performance, we should introduce long distance correlations
between words. However, traditional statistic language models such as
word n-grams and ergodic HMMs are insufficient for expressing long distance
correlations between words. In this paper, we propose an acquisition method
for a language model based on HMnet taking into consideration long distance
correlations and word location.
|
SP-23.7
|
Robust Dialogue-State Dependent Language Modeling Using Leaving-One-Out
Frank Wessel,
Andrea Baader (RWTH Aachen)
The use of dialogue-state dependent language models in automatic inquiry systems can improve speech recognition and understanding if a reasonable prediction of the dialogue-state is feasible. In this paper, the dialogue-state is defined as the set of parameters which are contained in the system prompt. For each dialogue-state a separate language model is constructed. In order to obtain robust language models despite the small amount of training data we propose to interpolate all of the dialogue-state dependent language models linearly for each dialogue-state and to train the large number of resulting interpolation weights with the EM-Algorithm in combination with Leaving-One-Out. We present experimental results on a small Dutch corpus which has been recorded in the Netherlands with a train timetable information system and show that the perplexity and the word error rate can be reduced significantly.
|
SP-23.8
|
On-line Algorithms for Combining Language Models
Adam Kalai,
Stanley Chen,
Avrim Blum,
Ronald Rosenfeld (Carnegie Mellon University)
Multiple language models are combined for many tasks
in language modeling, such as domain and topic adaptation. In
this work, we compare on-line algorithms from machine learning
to existing algorithms for combining language models. On-line
algorithms developed for this problem have parameters that are
updated dynamically to adapt to a data set during evaluation.
On-line analysis provides guarantees that these algorithms
will perform nearly as well as the best model chosen in hindsight
from a large class of models, e.g., the set of all static mixtures.
We describe several on-line algorithms and present results
comparing these techniques with existing language modeling
combination methods on the task of domain adaptation.
We demonstrate that, in some situations, on-line
techniques can significantly outperform static mixtures
(by over 10% in terms of perplexity) and are especially effective
when the nature of the test data is unknown or changes over time.
|
|