ICASSP99 Robust Speech Recognition and Adaptation

Robust Speech Recognition and Adaptation
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Time-Varying Noise Compensation Using Multiple Kalman Filters Authors: Nam Soo Kim, Page (NA) Paper number 1540 Abstract: The environmental conditions in which a speech recognition system should be operating are usually nonstationary. We present an approach to compensate for the effects of time-varying noise using a bank of Kalman filters. The presented method is based on the interacting multiple model (IMM) technique well-known in the area of multiple target tracking. Moreover, we propose a way to get fixed-interval smoothed estimates for the environmental parameters. The performances of the proposed approaches are evaluated in the continuous digit recognition experiments where not only the slowly evolving noise but also the rapidly varying noise sources are added to simulate the noisy environments. IC991540.PDF (From Author) IC991540.PDF (Rasterized) TOP A Segment-based C0 Adaptation Scheme for PMC-based Noisy Mandarin Speech Recognition Authors: Wei-Tyng Hong, Sin-Horng Chen, Page (NA) Paper number 1607 Abstract: A segment-based C0 (the zero-th order of cepstral coefficient) adaptation scheme for PMC-based Mandarin speech recognition is proposed in this paper. It incorporates a new C0 model of speech signal into the PMC method to improve the gain matching between the clean-speech HMM models and the current noise model. The C0 model is constructed in the training phase by jointly modeling the normalized C0 with other MFCC recognition features to form C0-normalized HMM models. In the testing phase, it pre-segments the input utterance into syllable-like segments, performs C0-denormaliztion operations to expand the C0-normalized HMM models, and uses them in the PMC method. Compared with the conventional PMC method, the proposed method can achieve a much better noise compensation effect due to the use of more precise gain matching in the PMC model combination. Experimental results showed that the base-syllable accuracy rate was significantly upgraded for continuous noisy Mandarin speech recognition. IC991607.PDF (From Author) IC991607.PDF (Rasterized) TOP Improved Parallel Model Combination Techniques With Split Gaussian Mixtures For Speech Recognition Under Noisy Conditions Authors: Jeih-Weih Hung, Dept of Electrical Engineering, National Taiwan University (Taiwan) Jia-Lin Shen, Lin-Shan Lee, Dept of Electrical Engineering, National Taiwan University (Taiwan) Page (NA) Paper number 2151 Abstract: The parallel model combination (PMC) technique has been very successful and frequently used to improve the performance of a speech recognition system under noisy environments. In this approach it is assumed that the log spectrum of speech signals is Gaussian-distributed, which is not always valid especially when the number of mixtures in the HMM's is few. In this paper, a simple approach is proposed to improve the PMC method by splitting the mixtures before the domain transformation process in PMC is performed, and merging the mixtures back to the original number after the PMC processes are completed. Preliminary experimental results show that the increased number of mixtures during the PMC processes can in fact provide significant improvements over the original PMC method in terms of the recognition accuracies, especially when the SNR is low. IC992151.PDF (From Author) IC992151.PDF (Rasterized) TOP Speech Recognition and Enhancement by A Nonstationary AR HMM with Gain Adaptation Under Unknown Noise Authors: Gunther Ruske, Inst. for Human-Machine-Communication, Munich University of Technology, Germany (Germany) Ki Yong Lee, School of Electronic Engineering, Soongsil University, 1-1 Sangdo-5Dong, Dongjak-Ku, Seoul, 156-743 Korea (Korea) Page (NA) Paper number 1425 Abstract: In this paper, a gain-adapted speech recognition in unknown noise is developed in time domain. The noise is assumed to be the colored noise. The nonstationary autoregressive (NAR) hidden markov model (HMM) used to model clean speeches. The nonstationary AR is modeled by polynomial functions with a linear combination of M known basis functions. Enhancement using multiple Kalman filters is performed for the gain contour of speech and estimation of noise model when only the noisy signal is available. IC991425.PDF (Scanned) TOP Database And Online Adaptation For Improved Speech Recognition In Car Environments Authors: Alexander Fischer, Philips Research Laboratories, Aachen, Germany (Germany) Volker Stahl, Philips Research Laboratories Aachen, Germany (Germany) Page (NA) Paper number 1449 Abstract: Data collections in the car environment require much more effort in terms of cost and time as compared to the telephone or the office environment. Therefore we apply supervised database adaptation from the telephone environment to the car environment to allow quick setup of car environment recognizers. Further reduction of word error rate is obtained by unsupervised online adaptation during recognition. We investigate the common techniques MLLR and MAP for that purpose. We give results on command word recognition in the car environment for all combinations of database and online adaptation in task-dependent and task-independent scenarios. The possibility of setting up speech recognizers for the car environment based on telephone data and a limited amount of adaptation material from the car environment is demonstrated. IC991449.PDF (From Author) IC991449.PDF (Rasterized) TOP Training of HMM with Filtered Speech Material for Hands-free Recognition Authors: Diego Giuliani, Marco Matassoni, Maurizio Omologo, Piergiorgio Svaizer, Page (NA) Paper number 1895 Abstract: This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. The filtering consists in a convolution with the acoustic impulse response between speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on MLLR adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array. IC991895.PDF (From Author) IC991895.PDF (Rasterized) TOP Incremental Enrollment of Speech Recognizers Authors: Chafic E Mokbel, France Telecom - CNET - DIH/DIPS (Currently at IDIAP) (France) Olivier Collin, France Telecom - CNET - DIH/DIPS (France) Page (NA) Paper number 1468 Abstract: Classical adaptation approaches generally allow a reliably trained model to match a particular condition. In this paper, we define an incremental version of the segmental-EM algorithm. This method permits to incrementally enrich a model first trained with limited amount of data. Resource memory constraints allow only the initial data statistics to be stored. The proposed method uses these statistics by fixing, within the segmental EM algorithm applied on both initial and new data, the initial optimal paths in the model for the initial data. We proved theoretically that this is equivalent to the segmental MAP adaptation with specific choice of priors. Experimented on two speaker dependent telephone databases, the approach permitted to incrementally integrate new conditions of use. The performance was slightly less than that obtained with classical training over the whole data. As expected with the MAP interpretation of the algorithm, initial data characteristics influence largely the model evolution. IC991468.PDF (From Author) IC991468.PDF (Rasterized) TOP Automatic Speech Recognition: A Communication Perspective Authors: Bishnu S Atal, AT&T Labs, Florham Park, NJ 07932, USA (USA) Page (NA) Paper number 1910 Abstract: Speech recognition is usually regarded as a problem in the field of pattern recognition, where one first estimates the probability density function of each pattern to be recognized and then uses Bayes theorem to identify the pattern which provides the highest likelihood for the observed speech data. In this paper, we will take a different approach to this problem. In speech recognition, the goal is communication of information by voice and we will discuss the basics of speech recognition from a communication perspective. The speech signal at the acoustic level has a bit rate of 64 kb/s but the underlying sound patterns have an information rate of less than 100 b/s. What is the role of this high bit rate at the acoustic level? We will discuss the principles of decoding patterns that are submerged in an ocean of seemingly irrelevant information. IC991910.PDF (From Author) IC991910.PDF (Rasterized) TOP