Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-16 |
|
SP-16.1
|
Incorporating Confidence Measures in the Dutch Train Timetable Information System Developed in the Arise Project
Gies Bouwman,
Janienke Sturm,
Louis Boves (University of Nijmegen)
The use of Confidence Measures (CMs) in Spoken Dialog
System (SDS) applications to suppress the number of
verification turns for 'reliably correctly recognised
utterances' can greatly reduce average dialog length
which enhances usability and increases user satisfac-
tion [1]. This paper gives a brief but clear review of
the method of CM assessment, which was presented in
[2]. It proceeds by demonstrating how the Dutch ARISE
(Automatic Railways Information Systems in Europe) SDS
was equipped with this technology and shows in deep
detail how the parameters involved are to be optimised.
The evaluation reveals and explains a typical beha-
viour of this method with train timetable information-
alike systems. This results in a set of conclusions
that were not foreseen when the method was first deve-
loped for a directory information system. The paper
ends with an outlook for solutions in new research
directions.
|
SP-16.2
|
HMM and Neural Network based Speech Act Detection
Klaus Ries (Interactive Systems Labs at Carnegie Mellon University and University of Karlsruhe)
We present an incremental lattice generation approach to speech act
detection for spontaneous and overlapping speech in telephone
concersations (CallHome Spanish).
At each stage of the process it is therefore possible to use different
models after the initial HMM models have generated a reasonable set of
hypothesis.
These lattices can then be processed further by more complex models.
This study shows how neural networks can be used very
effectively in the classification of speech acts.
We find that speech acts can be classified better using the neural net
based approach than using the more classical ngram backoff model
approach.
The best resulting neural network operates only on unigrams and the
integration of the ngram backoff model as a prior to the model reduces
the performance of the model.
The neural network can therefore more likely be robust against errors from an LVCSR system
and can potentially be trained from a smaller database.
|
SP-16.3
|
The LIMSI ARISE System for Train Travel Information
Lori F Lamel,
Sophie Rosset,
Jean-Luc S Gauvain,
Samir K Bennacef (LIMSI-CNRS)
In the context of the LE-3 ARISE project we have been developing a
dialog system for vocal access to rail travel information. The system
provides schedule information for the main French intercity
connections, as well as, simulated fares and reservations, reductions
and services. Our goal is to obtain high dialog success rates with a
very open structure, where the user is free to ask any question or to
provide any information at any point in time. In order to improve
performance with such an open dialog strategy, we make use of implicit
confirmation using the callers wording (when possible), and change to
a more constrained dialog level when the dialog is not going well. In
addition to own assessment, the prototype system undergoes periodic
user evaluations carried out by the our partners at the French
Railways.
|
SP-16.4
|
Improving The Suitability Of Imperfect Transcriptions For Information Retrieval From Spoken Documents
Matthew A Siegler (Carnegie Mellon University),
Michael J. Witbrock (Justsystem Pittsburgh Research Center)
Recently there has been a considerable focus on information retrieval for multimedia databases.
When speech is used as the source material for multimedia indexing, the effect of transcriber
error on retrieval effectiveness must be considered. This paper describes a method for measuring
the relevance of documents to queries when information about the probability of word transcription
error is available. To support the use of this technique, a method is presented for estimating
word error probability in speech recognition engines that use word graphs (lattices). An information
retrieval experiment using this technique on a large corpus of spoken documents is discussed.
The method was able to reduce the difference in retrieval effectiveness between reference texts
and hypothesized texts by 13%-38% depending on the size of the document set.
|
SP-16.5
|
Automatic Topic Identification for Two-Level Call Routing
John A Golden,
Owen Kimball,
Man-Hung Siu,
Herbert Gish (BBN Technologies/GTE Internetworking)
This paper presents an approach to routing telephone calls
automatically, based upon their speech content. Our data consist of a
set of calls collected from a customer-service center with a two-level
menu, which allows jumping past the second level, and we view the
routing of these calls as a topic-identification problem. Our topic
identifier employs a multinomial model for keyword occurrences. We
describe the call-routing task in detail, discuss the multinomial
model, and present experiments which investigate several issues that
arise from using the model for this task.
|
SP-16.6
|
Named Entity Tagged Language Models
Yoshihiko Gotoh,
Steve Renals,
Gethin Williams (University of Sheffield)
We introduce Named Entity (NE) Language Modelling, a stochastic finite
state machine approach to identifying both words and NE categories from
a stream of spoken data. We provide an overview of our approach to NE
tagged language model (LM) generation together with results of the
application of such a LM to the task of out-of-vocabulary (OOV) word
reduction in large vocabulary speech recognition. Using the Wall Street
Journal and Broadcast News corpora, it is shown that the tagged LM was
able to reduce the overall word error rate by 14%, detecting up to 70%
of previously OOV words. We also describe an example of the direct
tagging of spoken data with NE categories.
|
SP-16.7
|
Speech Translation: Coupling of Recognition and Translation
Hermann Ney (Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany)
In speech translation, we are faced with the problem
of how to couple the speech recognition process and
the translation process. Starting from the Bayes
decision rule for speech translation, we analyze
how the interaction between the recognition process
and the translation process can be modelled. In the
light of this decision rule, we discuss the already
existing approaches to speech translation. None of
the existing approaches seems to have addressed this
direct interaction. We suggest two new methods, the
local averaging approximation and the monotone
alignments.
|
SP-16.8
|
Probabilistic models for topic detection and tracking
Frederick G Walls,
Hubert Jin,
Sreenivasa Sista,
Richard Schwartz (GTE/BBN Technologies)
We present probabilistic models for use in detecting and tracking topics in
broadcast news stories. Our information retrieval (IR) models are
formally explained. The Topic Detection and Tracking (TDT) initiative is
discussed. The application of probabilistic models to the topic detection and
tracking tasks is developed, and enhancements are discussed. We discuss four
variations of these models, and we report our preliminary test results
from the current TDT corpus.
|
|