Robust Speech Recognition in Noisy Environments

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

The Teager Energy Based Feature Parameters for Robust Speech Recognition in Car Noise

Authors:

Firas Jabloun, INRS-telecommunications (Canada) (Canada)
Ahmet Enis Çetin, Bilkent University (Turkey) (U.K.)

Page (NA) Paper number 1013

Abstract:

In this paper, a new set of speech feature parameters based on multirate signal processing and the Teager Energy Operator is developed. The speech signal is first divided into nonuniform subbands in mel-scale using a multirate filter-bank, then the Teager energies of the subsignals are estimated. Finally, the feature vector is constructed by log-compression and inverse DCT computation. The new feature parameters have a robust speech recognition performance in car engine noise which is low pass in nature.

IC991013.PDF (From Author) IC991013.PDF (Rasterized)

TOP


Avoiding Distortions Due To Speech Coding And Transmission Errors In GSM ASR Tasks

Authors:

Ascensión Gallardo-Antolín,
Fernando Díaz-de-María,
Francisco Valverde-Albacete,

Page (NA) Paper number 1443

Abstract:

In this paper, we have extended our previous research on a new approach to ASR in the GSM environment. Instead of recognizing from the decoded speech signal, our system works from the digital speech representation used by the GSM encoder. We have compared the performance of a conventional system and the one we propose on a speaker independent, isolated- digit ASR task. For the half and full-rate GSM codecs, from our results, we conclude that the proposed approach is much more effective in coping with the coding distortion and transmission errors. Furthermore, in clean speech conditions, our approach does not impoverish the recognition performance, even recognizing from GSM digital speech, in comparison with a conventional system working on unencoded speech.

IC991443.PDF (From Author) IC991443.PDF (Rasterized)

TOP


Binaural Bark Subband Preprocessing Of Nonstationary Signals For Noise Robust Speech Feature Extraction

Authors:

Mike Peters, BMW AG Research and Development 80788 Munich, Germany (Germany)

Page (NA) Paper number 1874

Abstract:

A two channel approach to noise robust feature extraction for speech recognition in the car is proposed. The coherence function within the Bark subbands of the MFCC Transform is calculated to estimate the spectral similarity of two statistic processes. It is illustrated how the coherence of speech in binaural signals is used to increase the robustness against incoherent noise. The introduced preprocessing of nonstationary signals in two microphones results in an additive correction term of the Mel-Frequency-Cepstral-Coeeficients.

IC991874.PDF (From Author) IC991874.PDF (Rasterized)

TOP


Speaker Normalized Spectral Subband Parameters for Noise Robust Speech Recognition

Authors:

Satoru Tsuge, ATR-ITL, JAPAN (Japan)
Toshiaki Fukada, ATR-ITL, JAPAN (Japan)
Harald Singer, ATR-ITL, JAPAN (Japan)

Page (NA) Paper number 1686

Abstract:

This paper proposes speaker normalized spectral subband centroids (SSCs) as supplementary features in noise environment speech recognition. SSCs are computed as frequency centroids for each subband from the power spectrum of the speech signal. Since the conventional SSCs depend on formant frequencies of a speaker, we introduce a speaker normalization technique into SSC computation to reduce the speaker variability. Experimental results on spontaneous speech recognition show that the speaker normalized SSCs are more useful as supplementary features for improving the recognition performance than the conventional SSCs.

IC991686.PDF (From Author) IC991686.PDF (Rasterized)

TOP


TempoRAl Patterns (TRAPs) In ASR Of Noisy Speech

Authors:

Hynek Hermansky,
Sangita Sharma,

Page (NA) Paper number 2427

Abstract:

In this paper we study a new approach to processing temporal information for automatic speech recognition (ASR). Specifically, we study the use of rather long-time TempoRAl Patterns (TRAPs) of spectral energies in place of the conventional spectral patterns for ASR. The proposed Neural TRAPs are found to yield significant amount of complementary information to that of the conventional spectral feature based ASR system. A combination of these two ASR systems is shown to result in improved robustness to several types of additive and convolutive environmental degradations. ~

IC992427.PDF (From Author) IC992427.PDF (Rasterized)

TOP


Signal Modeling for Isolated Word Recognition

Authors:

Montri Karnjanadecha,
Stephen A Zahorian,

Page (NA) Paper number 2036

Abstract:

This paper presents speech signal modeling techniques which are well suited to high performance and robust isolated word recognition. Speech is encoded by a discrete cosine transform of its spectra, after several preprocessing steps. Temporal information is then also explicitly encoded into the feature set. We present a new technique for incorporating this temporal information as a function of temporal position within each word. We tested features computed with this method using an alphabet recognition task based on the ISOLET database. The HTK toolkit was used to implement the isolated word recognizer with whole word HMM models. The best result obtained based on 50 features and speaker independent alphabet recognition was 98.0%. Gaussian noise was added to the original speech to simulate a noisy environment. We achieved a recognition accuracy of 95.8% at a SNR of 15 dB. We also tested our recognizer with simulated telephone quality speech by adding noise and band limiting the original speech. For this "telephone" speech, our recognizer achieved 89.6% recognition accuracy. The recognizer was also tested in a speaker dependent mode, resulting in 97.4% accuracy on test data.

IC992036.PDF (From Author) IC992036.PDF (Rasterized)

TOP


Transforming HMMs For Speaker-Independent Hands-Free Speech Recognition in the Car

Authors:

Yifan Gong,
John J. Godfrey,

Page (NA) Paper number 1721

Abstract:

In the absence of HMMs trained with speech collected in the target environment, one may use HMMs trained with a large amount of speech collected in another recording condition (e.g., quiet office, with high quality microphone.) However, this may result in poor performance because of the mismatch between the two acoustic conditions. We propose a linear regression-based model adaptation procedure to reduce such a mismatch. With some adaptation utterances collected for the target environment, the procedure transforms the HMMs trained in a quiet condition to maximize the likelihood of observing the adaptation utterances. The transformation must be designed to maintain speaker-independence of the HMM. Our speaker-independent test results show that with this procedure about 1% digit error rate can be achieved for hands-free recognition, using target environment speech from only 20 speakers

IC991721.PDF (From Author) IC991721.PDF (Rasterized)

TOP


Channel and Noise Adaptation via HMM Mixture Mean Transform and Stochastic Matching

Authors:

Shuen Kong Wong,
Bertram Shi,

Page (NA) Paper number 2228

Abstract:

We present a non-linear model transformation for adapting Gaussian Mixture HMMs using both static and dynamic MFCC observation vectors to additive noise and constant system tilt. This transformation depends upon a few compensation coefficients which can be estimated from channel distorted speech via Maximum-Likelihood stochastic matching. Experimental results validate the effectiveness of the adaptation. We also provide an adaptation strategy which can result in improved performance at reduced computational cost compared with a straightforward implementation of stochastic matching.

IC992228.PDF (From Author) IC992228.PDF (Rasterized)

TOP