Acoustic Modeling II

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Frame Discrimination Training of HMMs for Large Vocabulary Speech Recognition

Authors:

Dan Povey,
Philip C Woodland,

Page (NA) Paper number 2315

Abstract:

This paper describes the application of a discriminative HMM parameter estimation technique called Frame Discrimination (FD), to medium and large vocabulary continuous speech recognition. Previous work showed that FD training gave better results than maximum mutual information (MMI) training for small tasks. The use of FD for much larger tasks required the development of a technique to be able to rapidly find the most likely set of Gaussians for each frame in the system. Experiments on the Resource Management and North American Business tasks show that FD training can give comparable improvements to MMI, but is less computationally intensive.

IC992315.PDF (From Author) IC992315.PDF (Rasterized)

TOP


Discriminative Mixture Weight Estimation for Large Gaussian Mixture Models

Authors:

Francoise Beaufays,
Mitchel Weintraub,
Yochai Konig,

Page (NA) Paper number 1466

Abstract:

This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are estimated with a sentence-level discriminative training criterion. The estimation problem is casted in a neural network framework, which enables the incorporation of the appropriate constraints on the mixture weight vectors, and allows a straight-forward training procedure, based on steepest descent. Experiments conducted on the Callhome-English and Switchboard databases show a significant improvement of the acoustic model performance, and a somewhat lesser improvement with the combined acoustic and language models.

IC991466.PDF (From Author) IC991466.PDF (Rasterized)

TOP


Modeling Disfluency And Background Events In ASR For A Natural Language Understanding Task

Authors:

Richard C. Rose,
Giuseppe Riccardi,

Page (NA) Paper number 1709

Abstract:

This paper investigates techniques for minimizing the impact of non-speech sounds on the performance of large vocabulary continuous speech recognition (LVCSR) systems. An experimental study is presented that investigates whether the careful manual labeling of disfluency and background events in conversational speech can be used to provide an additional level of supervision in training HMM acoustic models and statistical language models. First, techniques are investigated for encorporating explicitly labeled disfluency and background events directly into the acoustic HMM model. Second, phrase--based statistical language models are trained from utterance transcriptions which include labeled instances of these events. Finally, it is shown that significant word accuracy and run--time performance improvements are obtained for both sets of techniques on a telephone--based spoken language understanding task.

IC991709.PDF (From Author) IC991709.PDF (Rasterized)

TOP


Decision Tree State Tying Based on Penalized Bayesian Information Criterion

Authors:

Wu Chou,
Wolfgang Reichl,

Page (NA) Paper number 2481

Abstract:

In this paper, an approach of penalized Bayesian information criterion (pBIC) for decision tree state tying is described. The pBIC is applied to two important applications. First, it is used as a decision tree growing criterion in place of the conventional approach of using a heuristic constant threshold. It is found that original BIC penalty is too low and will not lead to compact decision tree state tying model. Based on Wolfe's modification to the asymptotic null distribution, it is derived that two times BIC penalty should be used for decision tree state tying based on pBIC. Secondly, pBIC is studied as a model compression criterion for decision tree state tying based acoustic modeling. Experimental results on a large vocabulary (Wall Street Journal) speech recognition task indicate that compact decision tree could be achieved with almost no loss of the speech recognition performance.

IC992481.PDF (From Author) IC992481.PDF (Rasterized)

TOP


A 2D Extended HMM for Speech Recognition

Authors:

Jiayu Li,
Alejandro Murua,

Page (NA) Paper number 1841

Abstract:

A two-dimensional extension of Hidden Markov Models (HMM) is introduced, aiming at improving the modeling of speech signals. The extended model (a) focuses on the conditional joint distribution of state durations given the length of utterances, rather than on state transition probabilities; (b) extends the dependency of observation densities to current, as well as neighboring states; and (c) introduces a local averaging procedure to smooth the outcome associated to transitions from successive states. A set of efficient iterative algorithms, based on segmental K-means and Iterative Conditional Modes, for the implementation of the extended model, is also presented. In applications to the recognition of segmented digits spoken over the telephone, the extended model achieved about 23% reduction in the recognition error rate, when compared to the performance of HMMs.

IC991841.PDF (From Author) IC991841.PDF (Rasterized)

TOP


Probabilistic Classification of HMM States for Large Vocabulary Continuous Speech Recognition

Authors:

Xiaoqiang Luo,
Frederick Jelinek,

Page (NA) Paper number 2044

Abstract:

In state-of-art large vocabulary continuous speech recognition (LVCSR) systems, HMM state-tying is often used to achieve good balance between the model resolution and robustness. In this paradigm, tied HMM states share a single set of parameters and are non-distinguishable. To capture the fine differences among tied HMM states, the probabilistic classification of HMM states (PCHMM) is proposed in this paper for LVCSR. In particular, a distribution from a HMM state to classes is introduced. It is shown that the state-to-class distribution can be estimated together with conventional HMM parameters within the EM framework. Compared with HMM state-tying, probabilistic classification of HMM states makes more efficient use of model parameters. It also makes the acoustic model more robust against the possible mismatch or variation between training and test data. The viability of this approach is verified by the significant reduction of word error rate (WER) on the Switchboard task.

IC992044.PDF (From Author) IC992044.PDF (Rasterized)

TOP


The HDM: A Segmental Hidden Dynamic Model of Coarticulation

Authors:

Hywel B Richards, Dragon Systems UK (U.K.)
John S Bridle, Dragon Systems UK (U.K.)

Page (NA) Paper number 1930

Abstract:

This paper introduces a new approach to acoustic-phonetic modelling, the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of a single vector target per phone in a hidden dynamic space in which speech trajectories are produced by a simple dynamic system. The hidden space is mapped to the surface acoustic representation via a non-linear mapping in the form of a multilayer perceptron (MLP). Algorithms are presented for training of all the parameters (target vectors and MLP weights) from segmented and labelled acoustic observations alone, with no special initialisation. The model captures the dynamic structure of speech, and appears to aid a speech recognition task based on the SwitchBoard corpus.

IC991930.PDF (From Author) IC991930.PDF (Rasterized)

TOP


Maximum Likelihood Estimates for Exponential Type Density Families

Authors:

Sankar Basu,
Charles A Micchelli,
Peder A Olsen,

Page (NA) Paper number 2066

Abstract:

We consider a parametric family of density functions of the type exp(-|x|^(alpha)/2) for modeling acoustic feature vectors used in automatic recognition of speech. The parameter "alpha" is a measure of the impulsiveness as well as the nongaussian nature of the data. While previous work has focussed on estimating the mean and the variance of the data here we attempt to estimate the impulsiveness "alpha" from the data on a maximum likelihood basis. We show that there is a balance between "alpha" and the number of data points "N" that must be satisfied before maximum likelihood estimation is carried out. Numerical experiments are performed on multidimensional vectors obtained from speech data.

IC992066.PDF (From Author) IC992066.PDF (Rasterized)

TOP