Authors:
Gies Bouwman,
Janienke Sturm,
Louis Boves,
Page (NA) Paper number 1504
Abstract:
The use of Confidence Measures (CMs) in Spoken Dialog System (SDS)
applications to suppress the number of verification turns for 'reliably
correctly recognised utterances' can greatly reduce average dialog
length which enhances usability and increases user satisfac- tion [1].
This paper gives a brief but clear review of the method of CM assessment,
which was presented in [2]. It proceeds by demonstrating how the Dutch
ARISE (Automatic Railways Information Systems in Europe) SDS was equipped
with this technology and shows in deep detail how the parameters involved
are to be optimised. The evaluation reveals and explains a typical
beha- viour of this method with train timetable information- alike
systems. This results in a set of conclusions that were not foreseen
when the method was first deve- loped for a directory information system.
The paper ends with an outlook for solutions in new research directions.
Authors:
Klaus Ries,
Page (NA) Paper number 2173
Abstract:
We present an incremental lattice generation approach to speech act
detection for spontaneous and overlapping speech in telephone concersations
(CallHome Spanish). At each stage of the process it is therefore possible
to use different models after the initial HMM models have generated
a reasonable set of hypothesis. These lattices can then be processed
further by more complex models. This study shows how neural networks
can be used very effectively in the classification of speech acts.
We find that speech acts can be classified better using the neural
net based approach than using the more classical ngram backoff model
approach. The best resulting neural network operates only on unigrams
and the integration of the ngram backoff model as a prior to the model
reduces the performance of the model. The neural network can therefore
more likely be robust against errors from an LVCSR system and can potentially
be trained from a smaller database.
Authors:
Lori F Lamel,
Sophie Rosset,
Jean-Luc S Gauvain,
Samir K Bennacef,
Page (NA) Paper number 2240
Abstract:
In the context of the LE-3 ARISE project we have been developing a
dialog system for vocal access to rail travel information. The system
provides schedule information for the main French intercity connections,
as well as, simulated fares and reservations, reductions and services.
Our goal is to obtain high dialog success rates with a very open structure,
where the user is free to ask any question or to provide any information
at any point in time. In order to improve performance with such an
open dialog strategy, we make use of implicit confirmation using the
callers wording (when possible), and change to a more constrained dialog
level when the dialog is not going well. In addition to own assessment,
the prototype system undergoes periodic user evaluations carried out
by the our partners at the French Railways.
Authors:
Matthew A Siegler,
Michael J. Witbrock,
Page (NA) Paper number 2442
Abstract:
Recently there has been a considerable focus on information retrieval
for multimedia databases. When speech is used as the source material
for multimedia indexing, the effect of transcriber error on retrieval
effectiveness must be considered. This paper describes a method for
measuring the relevance of documents to queries when information about
the probability of word transcription error is available. To support
the use of this technique, a method is presented for estimating word
error probability in speech recognition engines that use word graphs
(lattices). An information retrieval experiment using this technique
on a large corpus of spoken documents is discussed. The method was
able to reduce the difference in retrieval effectiveness between reference
texts and hypothesized texts by 13%-38% depending on the size of the
document set.
Authors:
John A Golden,
Owen Kimball,
Man-Hung Siu,
Herbert Gish,
Page (NA) Paper number 2468
Abstract:
This paper presents an approach to routing telephone calls automatically,
based upon their speech content. Our data consist of a set of calls
collected from a customer-service center with a two-level menu, which
allows jumping past the second level, and we view the routing of these
calls as a topic-identification problem. Our topic identifier employs
a multinomial model for keyword occurrences. We describe the call-routing
task in detail, discuss the multinomial model, and present experiments
which investigate several issues that arise from using the model for
this task.
Authors:
Yoshihiko Gotoh,
Steve Renals,
Gethin Williams,
Page (NA) Paper number 1984
Abstract:
We introduce Named Entity (NE) Language Modelling, a stochastic finite
state machine approach to identifying both words and NE categories
from a stream of spoken data. We provide an overview of our approach
to NE tagged language model (LM) generation together with results of
the application of such a LM to the task of out-of-vocabulary (OOV)
word reduction in large vocabulary speech recognition. Using the Wall
Street Journal and Broadcast News corpora, it is shown that the tagged
LM was able to reduce the overall word error rate by 14%, detecting
up to 70% of previously OOV words. We also describe an example of the
direct tagging of spoken data with NE categories.
Authors:
Hermann Ney, Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany (Germany)
Page (NA) Paper number 1675
Abstract:
In speech translation, we are faced with the problem of how to couple
the speech recognition process and the translation process. Starting
from the Bayes decision rule for speech translation, we analyze how
the interaction between the recognition process and the translation
process can be modelled. In the light of this decision rule, we discuss
the already existing approaches to speech translation. None of the
existing approaches seems to have addressed this direct interaction.
We suggest two new methods, the local averaging approximation and the
monotone alignments.
Authors:
Frederick G Walls,
Hubert Jin,
Sreenivasa Sista,
Richard Schwartz,
Page (NA) Paper number 2404
Abstract:
We present probabilistic models for use in detecting and tracking topics
in broadcast news stories. Our information retrieval (IR) models are
formally explained. The Topic Detection and Tracking (TDT) initiative
is discussed. The application of probabilistic models to the topic
detection and tracking tasks is developed, and enhancements are discussed.
We discuss four variations of these models, and we report our preliminary
test results from the current TDT corpus.
|