SP-16.1

Incorporating Confidence Measures in the Dutch Train Timetable Information System Developed in the Arise Project
Gies Bouwman, Janienke Sturm, Louis Boves (University of Nijmegen)

The use of Confidence Measures (CMs) in Spoken Dialog System (SDS) applications to suppress the number of verification turns for 'reliably correctly recognised utterances' can greatly reduce average dialog length which enhances usability and increases user satisfac- tion [1]. This paper gives a brief but clear review of the method of CM assessment, which was presented in [2]. It proceeds by demonstrating how the Dutch ARISE (Automatic Railways Information Systems in Europe) SDS was equipped with this technology and shows in deep detail how the parameters involved are to be optimised. The evaluation reveals and explains a typical beha- viour of this method with train timetable information- alike systems. This results in a set of conclusions that were not foreseen when the method was first deve- loped for a directory information system. The paper ends with an outlook for solutions in new research directions.

SP-16.2

HMM and Neural Network based Speech Act Detection
Klaus Ries (Interactive Systems Labs at Carnegie Mellon University and University of Karlsruhe)

We present an incremental lattice generation approach to speech act detection for spontaneous and overlapping speech in telephone concersations (CallHome Spanish). At each stage of the process it is therefore possible to use different models after the initial HMM models have generated a reasonable set of hypothesis. These lattices can then be processed further by more complex models. This study shows how neural networks can be used very effectively in the classification of speech acts. We find that speech acts can be classified better using the neural net based approach than using the more classical ngram backoff model approach. The best resulting neural network operates only on unigrams and the integration of the ngram backoff model as a prior to the model reduces the performance of the model. The neural network can therefore more likely be robust against errors from an LVCSR system and can potentially be trained from a smaller database.

SP-16.3

The LIMSI ARISE System for Train Travel Information
Lori F Lamel, Sophie Rosset, Jean-Luc S Gauvain, Samir K Bennacef (LIMSI-CNRS)

In the context of the LE-3 ARISE project we have been developing a dialog system for vocal access to rail travel information. The system provides schedule information for the main French intercity connections, as well as, simulated fares and reservations, reductions and services. Our goal is to obtain high dialog success rates with a very open structure, where the user is free to ask any question or to provide any information at any point in time. In order to improve performance with such an open dialog strategy, we make use of implicit confirmation using the callers wording (when possible), and change to a more constrained dialog level when the dialog is not going well. In addition to own assessment, the prototype system undergoes periodic user evaluations carried out by the our partners at the French Railways.

SP-16.4

Improving The Suitability Of Imperfect Transcriptions For Information Retrieval From Spoken Documents
Matthew A Siegler (Carnegie Mellon University), Michael J. Witbrock (Justsystem Pittsburgh Research Center)

Recently there has been a considerable focus on information retrieval for multimedia databases. When speech is used as the source material for multimedia indexing, the effect of transcriber error on retrieval effectiveness must be considered. This paper describes a method for measuring the relevance of documents to queries when information about the probability of word transcription error is available. To support the use of this technique, a method is presented for estimating word error probability in speech recognition engines that use word graphs (lattices). An information retrieval experiment using this technique on a large corpus of spoken documents is discussed. The method was able to reduce the difference in retrieval effectiveness between reference texts and hypothesized texts by 13%-38% depending on the size of the document set.

SP-16.5

Automatic Topic Identification for Two-Level Call Routing
John A Golden, Owen Kimball, Man-Hung Siu, Herbert Gish (BBN Technologies/GTE Internetworking)

This paper presents an approach to routing telephone calls automatically, based upon their speech content. Our data consist of a set of calls collected from a customer-service center with a two-level menu, which allows jumping past the second level, and we view the routing of these calls as a topic-identification problem. Our topic identifier employs a multinomial model for keyword occurrences. We describe the call-routing task in detail, discuss the multinomial model, and present experiments which investigate several issues that arise from using the model for this task.

SP-16.6

Named Entity Tagged Language Models
Yoshihiko Gotoh, Steve Renals, Gethin Williams (University of Sheffield)

We introduce Named Entity (NE) Language Modelling, a stochastic finite state machine approach to identifying both words and NE categories from a stream of spoken data. We provide an overview of our approach to NE tagged language model (LM) generation together with results of the application of such a LM to the task of out-of-vocabulary (OOV) word reduction in large vocabulary speech recognition. Using the Wall Street Journal and Broadcast News corpora, it is shown that the tagged LM was able to reduce the overall word error rate by 14%, detecting up to 70% of previously OOV words. We also describe an example of the direct tagging of spoken data with NE categories.

SP-16.7

Speech Translation: Coupling of Recognition and Translation
Hermann Ney (Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany)

In speech translation, we are faced with the problem of how to couple the speech recognition process and the translation process. Starting from the Bayes decision rule for speech translation, we analyze how the interaction between the recognition process and the translation process can be modelled. In the light of this decision rule, we discuss the already existing approaches to speech translation. None of the existing approaches seems to have addressed this direct interaction. We suggest two new methods, the local averaging approximation and the monotone alignments.

SP-16.8

Probabilistic models for topic detection and tracking
Frederick G Walls, Hubert Jin, Sreenivasa Sista, Richard Schwartz (GTE/BBN Technologies)

We present probabilistic models for use in detecting and tracking topics in broadcast news stories. Our information retrieval (IR) models are formally explained. The Topic Detection and Tracking (TDT) initiative is discussed. The application of probabilistic models to the topic detection and tracking tasks is developed, and enhancements are discussed. We discuss four variations of these models, and we report our preliminary test results from the current TDT corpus.

< SP-15 SP-17 >

Last Update: February 4, 1999 Ingo Höntsch