Speech Understanding

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Incorporating Confidence Measures in the Dutch Train Timetable Information System Developed in the ARISE Project

Authors:

Gies Bouwman,
Janienke Sturm,
Louis Boves,

Page (NA) Paper number 1504

Abstract:

The use of Confidence Measures (CMs) in Spoken Dialog System (SDS) applications to suppress the number of verification turns for 'reliably correctly recognised utterances' can greatly reduce average dialog length which enhances usability and increases user satisfac- tion [1]. This paper gives a brief but clear review of the method of CM assessment, which was presented in [2]. It proceeds by demonstrating how the Dutch ARISE (Automatic Railways Information Systems in Europe) SDS was equipped with this technology and shows in deep detail how the parameters involved are to be optimised. The evaluation reveals and explains a typical beha- viour of this method with train timetable information- alike systems. This results in a set of conclusions that were not foreseen when the method was first deve- loped for a directory information system. The paper ends with an outlook for solutions in new research directions.

IC991504.PDF (From Author) IC991504.PDF (Rasterized)

TOP


HMM and Neural Network based Speech Act Detection

Authors:

Klaus Ries,

Page (NA) Paper number 2173

Abstract:

We present an incremental lattice generation approach to speech act detection for spontaneous and overlapping speech in telephone concersations (CallHome Spanish). At each stage of the process it is therefore possible to use different models after the initial HMM models have generated a reasonable set of hypothesis. These lattices can then be processed further by more complex models. This study shows how neural networks can be used very effectively in the classification of speech acts. We find that speech acts can be classified better using the neural net based approach than using the more classical ngram backoff model approach. The best resulting neural network operates only on unigrams and the integration of the ngram backoff model as a prior to the model reduces the performance of the model. The neural network can therefore more likely be robust against errors from an LVCSR system and can potentially be trained from a smaller database.

IC992173.PDF (From Author) IC992173.PDF (Rasterized)

TOP


The LIMSI ARISE System for Train Travel Information

Authors:

Lori F Lamel,
Sophie Rosset,
Jean-Luc S Gauvain,
Samir K Bennacef,

Page (NA) Paper number 2240

Abstract:

In the context of the LE-3 ARISE project we have been developing a dialog system for vocal access to rail travel information. The system provides schedule information for the main French intercity connections, as well as, simulated fares and reservations, reductions and services. Our goal is to obtain high dialog success rates with a very open structure, where the user is free to ask any question or to provide any information at any point in time. In order to improve performance with such an open dialog strategy, we make use of implicit confirmation using the callers wording (when possible), and change to a more constrained dialog level when the dialog is not going well. In addition to own assessment, the prototype system undergoes periodic user evaluations carried out by the our partners at the French Railways.

IC992240.PDF (From Author) IC992240.PDF (Rasterized)

TOP


Improving The Suitability Of Imperfect Transcriptions For Information Retrieval From Spoken Documents

Authors:

Matthew A Siegler,
Michael J. Witbrock,

Page (NA) Paper number 2442

Abstract:

Recently there has been a considerable focus on information retrieval for multimedia databases. When speech is used as the source material for multimedia indexing, the effect of transcriber error on retrieval effectiveness must be considered. This paper describes a method for measuring the relevance of documents to queries when information about the probability of word transcription error is available. To support the use of this technique, a method is presented for estimating word error probability in speech recognition engines that use word graphs (lattices). An information retrieval experiment using this technique on a large corpus of spoken documents is discussed. The method was able to reduce the difference in retrieval effectiveness between reference texts and hypothesized texts by 13%-38% depending on the size of the document set.

IC992442.PDF (From Author) IC992442.PDF (Rasterized)

TOP


Automatic Topic Identification for Two-Level Call Routing

Authors:

John A Golden,
Owen Kimball,
Man-Hung Siu,
Herbert Gish,

Page (NA) Paper number 2468

Abstract:

This paper presents an approach to routing telephone calls automatically, based upon their speech content. Our data consist of a set of calls collected from a customer-service center with a two-level menu, which allows jumping past the second level, and we view the routing of these calls as a topic-identification problem. Our topic identifier employs a multinomial model for keyword occurrences. We describe the call-routing task in detail, discuss the multinomial model, and present experiments which investigate several issues that arise from using the model for this task.

IC992468.PDF (From Author) IC992468.PDF (Rasterized)

TOP


Named Entity Tagged Language Models

Authors:

Yoshihiko Gotoh,
Steve Renals,
Gethin Williams,

Page (NA) Paper number 1984

Abstract:

We introduce Named Entity (NE) Language Modelling, a stochastic finite state machine approach to identifying both words and NE categories from a stream of spoken data. We provide an overview of our approach to NE tagged language model (LM) generation together with results of the application of such a LM to the task of out-of-vocabulary (OOV) word reduction in large vocabulary speech recognition. Using the Wall Street Journal and Broadcast News corpora, it is shown that the tagged LM was able to reduce the overall word error rate by 14%, detecting up to 70% of previously OOV words. We also describe an example of the direct tagging of spoken data with NE categories.

IC991984.PDF (From Author) IC991984.PDF (Rasterized)

TOP


Speech Translation: Coupling of Recognition and Translation

Authors:

Hermann Ney, Lehrstuhl fuer Informatik VI, RWTH Aachen, University of Technology, D-52056 Aachen, Germany (Germany)

Page (NA) Paper number 1675

Abstract:

In speech translation, we are faced with the problem of how to couple the speech recognition process and the translation process. Starting from the Bayes decision rule for speech translation, we analyze how the interaction between the recognition process and the translation process can be modelled. In the light of this decision rule, we discuss the already existing approaches to speech translation. None of the existing approaches seems to have addressed this direct interaction. We suggest two new methods, the local averaging approximation and the monotone alignments.

IC991675.PDF (From Author) IC991675.PDF (Rasterized)

TOP


Probabilistic Models For Topic Detection And Tracking

Authors:

Frederick G Walls,
Hubert Jin,
Sreenivasa Sista,
Richard Schwartz,

Page (NA) Paper number 2404

Abstract:

We present probabilistic models for use in detecting and tracking topics in broadcast news stories. Our information retrieval (IR) models are formally explained. The Topic Detection and Tracking (TDT) initiative is discussed. The application of probabilistic models to the topic detection and tracking tasks is developed, and enhancements are discussed. We discuss four variations of these models, and we report our preliminary test results from the current TDT corpus.

IC992404.PDF (From Author) IC992404.PDF (Rasterized)

TOP