Speech Understanding and Systems

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Using Non-Word Lexical Units In Automatic Speech Understanding

Authors:

Mikel Peñagarikano,
German Bordel,
Amparo Varona,
Karmele López de Ipiña,

Page (NA) Paper number 1698

Abstract:

If the objective of a Continuous Automatic Speech Understanding system is not a speech-to-text translation, words are not strictly needed, and then the use of alternative lexical units (LUs) will bring us a new degree of freedom to improve the system performance. Consequently, we experimentally explore some methods to automatically extract a set of LUs from a Spanish training corpus and verify that the system can be improved in two ways: reducing the computational costs and increasing the recognition rates. Moreover, preliminary results point out that, even if the system target is a speech-to-text translation, using non-word units and post-processing the output to produce the corresponding word chain outperforms the word based system.

IC991698.PDF (From Author) IC991698.PDF (Rasterized)

TOP


Message-Driven Speech Recogniton and Topic-Word Extraction

Authors:

Katsutoshi Ohtsuki,
Sadaoki Furui,
Atsushi Iwasaki,
Naoyuki Sakurai,

Page (NA) Paper number 1924

Abstract:

This paper proposes a new formulation for speech recognition/understanding systems, in which the a posteriori probability of a speaker's message that the speaker intend to address given an observed acoustic sequence is maximized. This is an extension of the current criterion that maximizes a probability of a word sequence. Among the various possible representations, we employ co-occurrence score of words measured mutual information as the conditional probability of a word sequence occurring in a given message. The word sequence hypotheses obtained by bigram and trigram language models are rescored using the co-occurrence score. Experimental results show that the word accuracy is improved by this method. Topic-words, which represent the content of a speech signal are then extracted from speech recognition results based on the significance score of each word. When five topic-words are extracted for each broadcast-news article, 82.8% of them are correct in average. This paper also proposes a verbalization-dependent language model, which is useful for Japanese dictation systems.

IC991924.PDF (From Author) IC991924.PDF (Rasterized)

TOP


PROFER: Predictive, Robust Finite-State Parsing for Spoken Language

Authors:

Edward C Kaiser,
Michael Johnston,
Peter A Heeman,

Page (NA) Paper number 2206

Abstract:

The natural language processing component of a speech understanding system is commonly a robust, semantic parser, implemented as either a chart-based transition network, or as a generalized left-right (GLR) parser. In contrast, we are developing a robust, semantic parser that is a single, predictive finite-state machine. Our approach is motivated by our belief that such a finite-state parser can ultimately provide an efficient vehicle for tightly integrating higher-level linguistic knowledge into speech recognition. We report on our development of this parser, with an example of its use, and a description of how it compares to both finite-state predictors and chart-based semantic parsers, while combining the elements of both.

IC992206.PDF (From Author) IC992206.PDF (Rasterized)

TOP


Usability Field-Test of a Spoken Data-Entry System

Authors:

Marcello Federico,
Fabio Brugnara,
Roberto Gretter,

Page (NA) Paper number 2458

Abstract:

This paper reports on the field-test of a speech based data-entry system developed as a follow-up of an EC funded project. The application domain is the data-entry of personnel absence records from a huge historical paper file (about 100,000 records). The application was required by the personnel office of a public administration. The tested system resulted both sufficiently simple to make a detailed analysis feasible, and sufficiently representative of the potentials of spoken data-entry.

IC992458.PDF (From Author) IC992458.PDF (Rasterized)

TOP


A Framework of Performance Evaluation And Error Analysis Methodology for Speech Understanding Systems

Authors:

Bor-Shen Lin, National Taiwan University (Taiwan)
Lin-Shan Lee, National Taiwan University and Academia Sinica (Taiwan)

Page (NA) Paper number 2273

Abstract:

With improved speech understanding technology, many successful working systems have been developed. However, the high degree of complexity and wide variety of design methodology make the performance evaluation and error analysis for such systems very difficult. The different metrics for individual modules such as the word accuracy, spotting rate, language model coverage and slot accuracy are very often helpful, but it is always difficult to select or tune each of the individual modules or determine which module contributed to how much percentage of understanding errors based on such metrics. In this paper, a new framework for performance evaluation and error analysis for speech understanding systems is proposed based on the comparison with the 'best-matched' references obtained from the word graphs with the target words and tags given. In this framework, all test utterances can be classified based on the error types, and various understanding metrics can be obtained accordingly. Error analysis approaches based on an error plane are then proposed, with which the sources for understanding errors (e.g. poor acoustic recognition, poor language model, search error, etc.) can be identified for each utterance. Such a framework will be very helpful for design and analysis of speech understanding systems.

IC992273.PDF (From Author) IC992273.PDF (Rasterized)

TOP


Acoustic and Syntactical Modeling in the ATROS System

Authors:

D. Llorens, Unitat Predepartamental d'Informatica, Universitat Jaume I, Castello, Spain. (Spain)
F. Casacuberta, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
E. Segarra, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
J.A. Sánchez, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)
P. Aibar, Unitat Predepartamental d'Informatica, Universitat Jaume I, Castello, Spain. (Spain)
M.J. Castro, Dpto. Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Valencia, Spain. (Spain)

Page (NA) Paper number 1551

Abstract:

Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems. ATROS is an automatic speech recognition system whose acoustic, lexical, and syntactical models can be learnt automatically from training data by using similar techniques. In this paper, an improved version of ATROS which can deal with large smoothed language models and with large vocabularies is presented. This version supports acoustic and syntactical models trained with advanced grammatical inference techniques. It also incorporates new data structures and improved search algorithms to reduce the computational requirements for decoding. The system has been tested on a Spanish task of queries to a geographical database (with a vocabulary of 1,208 words).

IC991551.PDF (From Author) IC991551.PDF (Rasterized)

TOP


Towards A Robust Real-Time Decoder

Authors:

Jason C Davenport,
Richard Schwartz,
Long Nguyen,

Page (NA) Paper number 2409

Abstract:

In this paper we present several algorithms that speed up our BBN BYBLOS decoder. We briefly describe the techniques that we have used before this year. Then we present new techniques that speed up the recognition search by a factor of 10 with little effect on accuracy using a combination of Fast Gaussian Computation, grammar spreading, and grammar caching, within the 2-Pass n-best paradigm. We also describe our decoder metering strategy, which allows us to conveniently test for search errors. Finally, we describe a grammar compression technique that decreases the storage needed for each additional ngram to only 10 bits.

IC992409.PDF (From Author) IC992409.PDF (Rasterized)

TOP


A Statistical Text-To-Phone Function Using Ngrams and Rules

Authors:

William M Fisher,

Page (NA) Paper number 1926

Abstract:

Adopting concepts from statistical language modeling and rule-based transformations can lead to effective and efficient text-to-phone (TTP) functions. We present here the methods and results of one such effort, resulting in a relatively compact and fast set of TTP rules that achieves 94.5% segmental phonemic accuracy.

IC991926.PDF (From Author) IC991926.PDF (Rasterized)

TOP