Robust Speech Recognition and Adaptation

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Time-Varying Noise Compensation Using Multiple Kalman Filters

Authors:

Nam Soo Kim,

Page (NA) Paper number 1540

Abstract:

The environmental conditions in which a speech recognition system should be operating are usually nonstationary. We present an approach to compensate for the effects of time-varying noise using a bank of Kalman filters. The presented method is based on the interacting multiple model (IMM) technique well-known in the area of multiple target tracking. Moreover, we propose a way to get fixed-interval smoothed estimates for the environmental parameters. The performances of the proposed approaches are evaluated in the continuous digit recognition experiments where not only the slowly evolving noise but also the rapidly varying noise sources are added to simulate the noisy environments.

IC991540.PDF (From Author) IC991540.PDF (Rasterized)

TOP


A Segment-based C0 Adaptation Scheme for PMC-based Noisy Mandarin Speech Recognition

Authors:

Wei-Tyng Hong,
Sin-Horng Chen,

Page (NA) Paper number 1607

Abstract:

A segment-based C0 (the zero-th order of cepstral coefficient) adaptation scheme for PMC-based Mandarin speech recognition is proposed in this paper. It incorporates a new C0 model of speech signal into the PMC method to improve the gain matching between the clean-speech HMM models and the current noise model. The C0 model is constructed in the training phase by jointly modeling the normalized C0 with other MFCC recognition features to form C0-normalized HMM models. In the testing phase, it pre-segments the input utterance into syllable-like segments, performs C0-denormaliztion operations to expand the C0-normalized HMM models, and uses them in the PMC method. Compared with the conventional PMC method, the proposed method can achieve a much better noise compensation effect due to the use of more precise gain matching in the PMC model combination. Experimental results showed that the base-syllable accuracy rate was significantly upgraded for continuous noisy Mandarin speech recognition.

IC991607.PDF (From Author) IC991607.PDF (Rasterized)

TOP


Improved Parallel Model Combination Techniques With Split Gaussian Mixtures For Speech Recognition Under Noisy Conditions

Authors:

Jeih-Weih Hung, Dept of Electrical Engineering, National Taiwan University (Taiwan)
Jia-Lin Shen,
Lin-Shan Lee, Dept of Electrical Engineering, National Taiwan University (Taiwan)

Page (NA) Paper number 2151

Abstract:

The parallel model combination (PMC) technique has been very successful and frequently used to improve the performance of a speech recognition system under noisy environments. In this approach it is assumed that the log spectrum of speech signals is Gaussian-distributed, which is not always valid especially when the number of mixtures in the HMM's is few. In this paper, a simple approach is proposed to improve the PMC method by splitting the mixtures before the domain transformation process in PMC is performed, and merging the mixtures back to the original number after the PMC processes are completed. Preliminary experimental results show that the increased number of mixtures during the PMC processes can in fact provide significant improvements over the original PMC method in terms of the recognition accuracies, especially when the SNR is low.

IC992151.PDF (From Author) IC992151.PDF (Rasterized)

TOP


Speech Recognition and Enhancement by A Nonstationary AR HMM with Gain Adaptation Under Unknown Noise

Authors:

Gunther Ruske, Inst. for Human-Machine-Communication, Munich University of Technology, Germany (Germany)
Ki Yong Lee, School of Electronic Engineering, Soongsil University, 1-1 Sangdo-5Dong, Dongjak-Ku, Seoul, 156-743 Korea (Korea)

Page (NA) Paper number 1425

Abstract:

In this paper, a gain-adapted speech recognition in unknown noise is developed in time domain. The noise is assumed to be the colored noise. The nonstationary autoregressive (NAR) hidden markov model (HMM) used to model clean speeches. The nonstationary AR is modeled by polynomial functions with a linear combination of M known basis functions. Enhancement using multiple Kalman filters is performed for the gain contour of speech and estimation of noise model when only the noisy signal is available.

IC991425.PDF (Scanned)

TOP


Database And Online Adaptation For Improved Speech Recognition In Car Environments

Authors:

Alexander Fischer, Philips Research Laboratories, Aachen, Germany (Germany)
Volker Stahl, Philips Research Laboratories Aachen, Germany (Germany)

Page (NA) Paper number 1449

Abstract:

Data collections in the car environment require much more effort in terms of cost and time as compared to the telephone or the office environment. Therefore we apply supervised database adaptation from the telephone environment to the car environment to allow quick setup of car environment recognizers. Further reduction of word error rate is obtained by unsupervised online adaptation during recognition. We investigate the common techniques MLLR and MAP for that purpose. We give results on command word recognition in the car environment for all combinations of database and online adaptation in task-dependent and task-independent scenarios. The possibility of setting up speech recognizers for the car environment based on telephone data and a limited amount of adaptation material from the car environment is demonstrated.

IC991449.PDF (From Author) IC991449.PDF (Rasterized)

TOP


Training of HMM with Filtered Speech Material for Hands-free Recognition

Authors:

Diego Giuliani,
Marco Matassoni,
Maurizio Omologo,
Piergiorgio Svaizer,

Page (NA) Paper number 1895

Abstract:

This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. The filtering consists in a convolution with the acoustic impulse response between speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on MLLR adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array.

IC991895.PDF (From Author) IC991895.PDF (Rasterized)

TOP


Incremental Enrollment of Speech Recognizers

Authors:

Chafic E Mokbel, France Telecom - CNET - DIH/DIPS (Currently at IDIAP) (France)
Olivier Collin, France Telecom - CNET - DIH/DIPS (France)

Page (NA) Paper number 1468

Abstract:

Classical adaptation approaches generally allow a reliably trained model to match a particular condition. In this paper, we define an incremental version of the segmental-EM algorithm. This method permits to incrementally enrich a model first trained with limited amount of data. Resource memory constraints allow only the initial data statistics to be stored. The proposed method uses these statistics by fixing, within the segmental EM algorithm applied on both initial and new data, the initial optimal paths in the model for the initial data. We proved theoretically that this is equivalent to the segmental MAP adaptation with specific choice of priors. Experimented on two speaker dependent telephone databases, the approach permitted to incrementally integrate new conditions of use. The performance was slightly less than that obtained with classical training over the whole data. As expected with the MAP interpretation of the algorithm, initial data characteristics influence largely the model evolution.

IC991468.PDF (From Author) IC991468.PDF (Rasterized)

TOP


Automatic Speech Recognition: A Communication Perspective

Authors:

Bishnu S Atal, AT&T Labs, Florham Park, NJ 07932, USA (USA)

Page (NA) Paper number 1910

Abstract:

Speech recognition is usually regarded as a problem in the field of pattern recognition, where one first estimates the probability density function of each pattern to be recognized and then uses Bayes theorem to identify the pattern which provides the highest likelihood for the observed speech data. In this paper, we will take a different approach to this problem. In speech recognition, the goal is communication of information by voice and we will discuss the basics of speech recognition from a communication perspective. The speech signal at the acoustic level has a bit rate of 64 kb/s but the underlying sound patterns have an information rate of less than 100 b/s. What is the role of this high bit rate at the acoustic level? We will discuss the principles of decoding patterns that are submerged in an ocean of seemingly irrelevant information.

IC991910.PDF (From Author) IC991910.PDF (Rasterized)

TOP