Authors:
Mikel Peñagarikano,
German Bordel,
Amparo Varona,
Karmele López de Ipiña,
Page (NA) Paper number 1698
Abstract:
If the objective of a Continuous Automatic Speech Understanding system
is not a speech-to-text translation, words are not strictly needed,
and then the use of alternative lexical units (LUs) will bring us a
new degree of freedom to improve the system performance. Consequently,
we experimentally explore some methods to automatically extract a set
of LUs from a Spanish training corpus and verify that the system can
be improved in two ways: reducing the computational costs and increasing
the recognition rates. Moreover, preliminary results point out that,
even if the system target is a speech-to-text translation, using non-word
units and post-processing the output to produce the corresponding word
chain outperforms the word based system.
Authors:
Katsutoshi Ohtsuki,
Sadaoki Furui,
Atsushi Iwasaki,
Naoyuki Sakurai,
Page (NA) Paper number 1924
Abstract:
This paper proposes a new formulation for speech recognition/understanding
systems, in which the a posteriori probability of a speaker's message
that the speaker intend to address given an observed acoustic sequence
is maximized. This is an extension of the current criterion that maximizes
a probability of a word sequence. Among the various possible representations,
we employ co-occurrence score of words measured mutual information
as the conditional probability of a word sequence occurring in a given
message. The word sequence hypotheses obtained by bigram and trigram
language models are rescored using the co-occurrence score. Experimental
results show that the word accuracy is improved by this method. Topic-words,
which represent the content of a speech signal are then extracted from
speech recognition results based on the significance score of each
word. When five topic-words are extracted for each broadcast-news article,
82.8% of them are correct in average. This paper also proposes a verbalization-dependent
language model, which is useful for Japanese dictation systems.
Authors:
Edward C Kaiser,
Michael Johnston,
Peter A Heeman,
Page (NA) Paper number 2206
Abstract:
The natural language processing component of a speech understanding
system is commonly a robust, semantic parser, implemented as either
a chart-based transition network, or as a generalized left-right (GLR)
parser. In contrast, we are developing a robust, semantic parser that
is a single, predictive finite-state machine. Our approach is motivated
by our belief that such a finite-state parser can ultimately provide
an efficient vehicle for tightly integrating higher-level linguistic
knowledge into speech recognition. We report on our development of
this parser, with an example of its use, and a description of how it
compares to both finite-state predictors and chart-based semantic parsers,
while combining the elements of both.
Authors:
Marcello Federico,
Fabio Brugnara,
Roberto Gretter,
Page (NA) Paper number 2458
Abstract:
This paper reports on the field-test of a speech based data-entry system
developed as a follow-up of an EC funded project. The application domain
is the data-entry of personnel absence records from a huge historical
paper file (about 100,000 records). The application was required by
the personnel office of a public administration. The tested system
resulted both sufficiently simple to make a detailed analysis feasible,
and sufficiently representative of the potentials of spoken data-entry.
Authors:
Bor-Shen Lin, National Taiwan University (Taiwan)
Lin-Shan Lee, National Taiwan University and Academia Sinica (Taiwan)
Page (NA) Paper number 2273
Abstract:
With improved speech understanding technology, many successful working
systems have been developed. However, the high degree of complexity
and wide variety of design methodology make the performance evaluation
and error analysis for such systems very difficult. The different metrics
for individual modules such as the word accuracy, spotting rate, language
model coverage and slot accuracy are very often helpful, but it is
always difficult to select or tune each of the individual modules or
determine which module contributed to how much percentage of understanding
errors based on such metrics. In this paper, a new framework for performance
evaluation and error analysis for speech understanding systems is proposed
based on the comparison with the 'best-matched' references obtained
from the word graphs with the target words and tags given. In this
framework, all test utterances can be classified based on the error
types, and various understanding metrics can be obtained accordingly.
Error analysis approaches based on an error plane are then proposed,
with which the sources for understanding errors (e.g. poor acoustic
recognition, poor language model, search error, etc.) can be identified
for each utterance. Such a framework will be very helpful for design
and analysis of speech understanding systems.
Authors:
D. Llorens, Unitat Predepartamental d'Informatica, Universitat Jaume I, Castello, Spain. (Spain)
F. Casacuberta, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
E. Segarra, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
J.A. Sánchez, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
P. Aibar, Unitat Predepartamental d'Informatica, Universitat Jaume I, Castello, Spain. (Spain)
M.J. Castro, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
Page (NA) Paper number 1551
Abstract:
Current speech technology allows us to build efficient speech recognition
systems. However, model learning of knowledge sources in a speech recognition
system is not a closed problem. In addition, lower demand of computational
requirements are crucial to building real-time systems. ATROS is an
automatic speech recognition system whose acoustic, lexical, and syntactical
models can be learnt automatically from training data by using similar
techniques. In this paper, an improved version of ATROS which can deal
with large smoothed language models and with large vocabularies is
presented. This version supports acoustic and syntactical models trained
with advanced grammatical inference techniques. It also incorporates
new data structures and improved search algorithms to reduce the computational
requirements for decoding. The system has been tested on a Spanish
task of queries to a geographical database (with a vocabulary of 1,208
words).
Authors:
Jason C Davenport,
Richard Schwartz,
Long Nguyen,
Page (NA) Paper number 2409
Abstract:
In this paper we present several algorithms that speed up our BBN BYBLOS
decoder. We briefly describe the techniques that we have used before
this year. Then we present new techniques that speed up the recognition
search by a factor of 10 with little effect on accuracy using a combination
of Fast Gaussian Computation, grammar spreading, and grammar caching,
within the 2-Pass n-best paradigm. We also describe our decoder metering
strategy, which allows us to conveniently test for search errors. Finally,
we describe a grammar compression technique that decreases the storage
needed for each additional ngram to only 10 bits.
Authors:
William M Fisher,
Page (NA) Paper number 1926
Abstract:
Adopting concepts from statistical language modeling and rule-based
transformations can lead to effective and efficient text-to-phone (TTP)
functions. We present here the methods and results of one such effort,
resulting in a relatively compact and fast set of TTP rules that achieves
94.5% segmental phonemic accuracy.
|