Adaptation /Normalization

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Fast Speaker Adaptation Using a priori Knowledge

Authors:

Roland Kuhn,
Patrick Nguyen,
Jean-Claude Junqua,
Robert C Boman,
Nancy A Niedzielski,
Steven C Fincke,
Kenneth L Field,
Matteo Contolini,

Page (NA) Paper number 1587

Abstract:

Recently, we presented a radically new class of fast adaptation techniques for speech recognition, based on prior knowledge of speaker variation. To obtain this prior knowledge, one applies a dimensionality reduction technique to T vectors of dimension D derived from T speaker-dependent (SD) models. This offline step yields T basis vectors, the eigenvoices. We constrain the model for new speaker S to be located in the space spanned by the first K eigenvoices. Speaker adaptation involves estimating K eigenvoice coefficients for the new speaker; typically, K is very small compared to original dimension D. Here, we review how to find the eigenvoices, give a maximum-likelihood estimator for the new speaker's eigenvoice coefficients, and summarize mean adaptation experiments carried out on the Isolet database. We present new results which assess the impact on performance of changes in training of the SD models. Finally, we interpret the first few eigenvoices obtained.

IC991587.PDF (From Author) IC991587.PDF (Rasterized)

TOP


Speaker Adaptation Uing Maximum Likelihood Model Interpolation

Authors:

Zuoying Wang, EE. Department, Tshinghua Univ., Beijing, China (China)
Feng Liu, Electronic Engineering Department, Tsinghua Univ., Beijing, China (China)

Page (NA) Paper number 1368

Abstract:

A speaker adaptation scheme named maximum likelihood model interpolation (MLMI) is proposed. The basic idea of MLMI is to compute the speaker adapted (SA) model of a test speaker by a linear convex combination of a set of speaker dependent (SD) models. Given a set of training speakers, we first calculate the corresponding SD models for each training speaker as well as the speaker-independent (SI) models. Then, the mean vector of the SA model is computed as the weighted sum of the set of the SD mean vectors, while the covariance matrix is the same as that of the SI model. An algorithm to estimate the weight parameters is given which maximizes the likelihood of the SA model given the adaptation data. Experiments show that 3 adaptation sentences can give a signaificant performance improvement. As the number of SD models increases, further improvement can be obtained.

IC991368.PDF (From Author) IC991368.PDF (Rasterized)

TOP


Speaker Adaptation with All-Pass Transforms

Authors:

John W McDonough,
William J Byrne,

Page (NA) Paper number 2093

Abstract:

In recent work, a class of transforms were proposed which achieve a remapping of the frequency axis much like conventional vocal tract length normalization. These mappings, known collectively as all-pass transforms (APT), were shown to produce substantial improvements in the performance of a large vocabulary speech recognition system when used to normalize incoming speech prior to recognition. In this application, the most advantageous characteristic of the APT was its cepstral-domain linearity; this linearity makes speaker normalization simple to implement, and provides for the robust estimation of the parameters characterizing individual speakers. In the current work, we exploit the APT to develop a speaker adaptation scheme in which the cepstral means of a speech recognition model are transformed to better match the speech of a given speaker. In a set of speech recognition experiments conducted on the Switchboard Corpus, we report reductions in word error rate of 3.7% absolute.

IC992093.PDF (From Author) IC992093.PDF (Rasterized)

TOP


Improved Methods for Vocal Tract Normalization

Authors:

Lutz Welling,
Stephan Kanthak,
Hermann Ney,

Page (NA) Paper number 1436

Abstract:

This paper presents improved methods for vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new method for VTN in training: By using acoustic models with single Gaussian densities per state for selecting the normalization scales it is avoided that the models learn the normalization scales of the training speakers. We show that using single Gaussian densities for selecting the normalization scales in training results in lower error rates than using mixture densities. For VTN in recognition, we propose an improvement of the well-known multiple-pass strategy: By using an unnormalized acoustic model for the first recognition pass instead of a normalized model lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The multiple-pass strategy is an efficient method but it is suboptimal because the normalization scale and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. On the German spontaneous scheduling task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill the proposed methods for VTN reduce the error rates significantly.

IC991436.PDF (From Author) IC991436.PDF (Rasterized)

TOP


Rapid Speech Recognizer Adaptation to New Speakers

Authors:

Vassilis Digalakis,
Sid Berkowitz,
Enrico L Bocchieri,
Costas Boulis,
William J Byrne,
Heather Collier,
Adrian Corduneanu,
Ashvin Kannan,
Sanjeev P Khudanpur,
Ananth Sankar,

Page (NA) Paper number 2102

Abstract:

This paper summarizes the work of the ``Rapid Speech Recognizer Adaptation'' team in the workshop held at Johns Hopkins University in the summer of 1998. The project addressed the modeling of dependencies between units of speech with the goal of making more effective use of small amounts of data for speaker adaptation. A variety of methods were investigated and their effectiveness in a rapid adaptation task defined on the SWITCHBOARD conversational speech corpus is reported.

IC992102.PDF (From Author) IC992102.PDF (Rasterized)

TOP


Tree-Structured Models of Parameter Dependence for Rapid Adaptation in Large Vocabulary Conversational Speech Recognition

Authors:

Ashvin Kannan,
Sanjeev P Khudanpur,

Page (NA) Paper number 2197

Abstract:

Two models of statistical dependence between acoustic model parameters of a large vocabulary conversational speech recognition (LVCSR) system are investigated for the purpose of rapid speaker- and environment-adaptation from a very small amount of speech: (i) a Gaussian multiscale process governed by a stochastic linear dynamical system on a tree, and (ii) a simple hierarchical tree-structured prior. Both methods permit Bayesian (MAP) estimation of acoustic model parameters without parameter-tying even when no samples are available to independently estimate some parameters due to the limited amount of adaptation data. Modeling methodologies are contrasted, and comparative performance of the two on the Switchboard task is presented under identical test conditions for supervised and unsupervised adaptation with controlled amounts of adaptation speech. Both methods provide significant (1% absolute) gain in accuracy over adaptation methods that do not exploit the dependence between acoustic model parameters.

IC992197.PDF (From Author) IC992197.PDF (Rasterized)

TOP


Correlation Modeling Of MLLR Transform Biases For Rapid HMM Adaptation To New Speakers

Authors:

Enrico L Bocchieri,
Vassilis Digalakis,
Adrian Corduneanu,
Costas Boulis,

Page (NA) Paper number 2343

Abstract:

This paper concerns rapid adaptation of hidden Markov model (HMM) based speech recognizers to a new speaker, when only few speech samples (one minute or less) are available from the new speaker. A widely used family of adaptation algorithms defines adaptation as a linearly constrained reestimation of the HMM Gaussians. With few speech data, tight constraints must be introduced, by reducing the number of linear transforms and by specifying certain transform structures (e.g. block diagonal). We hypothesize that under these adaptation conditions, the residual errors of the adapted Gaussian parameters can be represented and corrected by dependency models, as estimated from a training corpus. Thus, after introducing a particular class of linear transforms, we develop correlation models of the transform parameters. In rapid adaptation experiments on the SWITCHBOARD corpus, the proposed algorithm performs better than the transform-constrained adaptation and the adaptation by correlation modeling of the HMM parameters, respectively.

IC992343.PDF (From Author) IC992343.PDF (Rasterized)

TOP


Speech Recognition in a Reverberant Environment using Matched Filter Array (MFA) Processing and Linguistic-Tree Maximum Likelihood Linear Regression (LT-MLLR) Adaptation

Authors:

Prabhu Raghavan,
Richard J Renomeron,
Chiwei Che,
Dong-Suk Yuk,
James L Flanagan,

Page (NA) Paper number 2002

Abstract:

Performance of automatic speech recognition systems trained on close-talking data suffers when used in a distant-talking environment due to the mismatch in training and testing conditions. Microphone array sound capture can reduce some mismatch by removing ambient noise and reverberation but offers insufficient improvement in performance. However, using array signal capture in conjunction with Hidden Markov Model (HMM) adaptation on the clean-speech models can result in improved recognition accuracy. This paper describes an experiment in which the output of an 8-element microphone array system using MFA processing is used for speech recognition with LT-MLLR adaptation. The recognition is done in two passes. In the first pass, an HMM trained on clean data is used to recognize the speech. Using the results of this pass, the HMM model is adapted to the environment using the LT-MLLR algorithm. This adapted model, a product of MFA and LT-MLLR, results in improved recognition performance.

IC992002.PDF (From Author) IC992002.PDF (Rasterized)

TOP