ICASSP99 Acoustic Modeling II

Acoustic Modeling II
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Frame Discrimination Training of HMMs for Large Vocabulary Speech Recognition Authors: Dan Povey, Philip C Woodland, Page (NA) Paper number 2315 Abstract: This paper describes the application of a discriminative HMM parameter estimation technique called Frame Discrimination (FD), to medium and large vocabulary continuous speech recognition. Previous work showed that FD training gave better results than maximum mutual information (MMI) training for small tasks. The use of FD for much larger tasks required the development of a technique to be able to rapidly find the most likely set of Gaussians for each frame in the system. Experiments on the Resource Management and North American Business tasks show that FD training can give comparable improvements to MMI, but is less computationally intensive. IC992315.PDF (From Author) IC992315.PDF (Rasterized) TOP Discriminative Mixture Weight Estimation for Large Gaussian Mixture Models Authors: Francoise Beaufays, Mitchel Weintraub, Yochai Konig, Page (NA) Paper number 1466 Abstract: This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are estimated with a sentence-level discriminative training criterion. The estimation problem is casted in a neural network framework, which enables the incorporation of the appropriate constraints on the mixture weight vectors, and allows a straight-forward training procedure, based on steepest descent. Experiments conducted on the Callhome-English and Switchboard databases show a significant improvement of the acoustic model performance, and a somewhat lesser improvement with the combined acoustic and language models. IC991466.PDF (From Author) IC991466.PDF (Rasterized) TOP Modeling Disfluency And Background Events In ASR For A Natural Language Understanding Task Authors: Richard C. Rose, Giuseppe Riccardi, Page (NA) Paper number 1709 Abstract: This paper investigates techniques for minimizing the impact of non-speech sounds on the performance of large vocabulary continuous speech recognition (LVCSR) systems. An experimental study is presented that investigates whether the careful manual labeling of disfluency and background events in conversational speech can be used to provide an additional level of supervision in training HMM acoustic models and statistical language models. First, techniques are investigated for encorporating explicitly labeled disfluency and background events directly into the acoustic HMM model. Second, phrase--based statistical language models are trained from utterance transcriptions which include labeled instances of these events. Finally, it is shown that significant word accuracy and run--time performance improvements are obtained for both sets of techniques on a telephone--based spoken language understanding task. IC991709.PDF (From Author) IC991709.PDF (Rasterized) TOP Decision Tree State Tying Based on Penalized Bayesian Information Criterion Authors: Wu Chou, Wolfgang Reichl, Page (NA) Paper number 2481 Abstract: In this paper, an approach of penalized Bayesian information criterion (pBIC) for decision tree state tying is described. The pBIC is applied to two important applications. First, it is used as a decision tree growing criterion in place of the conventional approach of using a heuristic constant threshold. It is found that original BIC penalty is too low and will not lead to compact decision tree state tying model. Based on Wolfe's modification to the asymptotic null distribution, it is derived that two times BIC penalty should be used for decision tree state tying based on pBIC. Secondly, pBIC is studied as a model compression criterion for decision tree state tying based acoustic modeling. Experimental results on a large vocabulary (Wall Street Journal) speech recognition task indicate that compact decision tree could be achieved with almost no loss of the speech recognition performance. IC992481.PDF (From Author) IC992481.PDF (Rasterized) TOP A 2D Extended HMM for Speech Recognition Authors: Jiayu Li, Alejandro Murua, Page (NA) Paper number 1841 Abstract: A two-dimensional extension of Hidden Markov Models (HMM) is introduced, aiming at improving the modeling of speech signals. The extended model (a) focuses on the conditional joint distribution of state durations given the length of utterances, rather than on state transition probabilities; (b) extends the dependency of observation densities to current, as well as neighboring states; and (c) introduces a local averaging procedure to smooth the outcome associated to transitions from successive states. A set of efficient iterative algorithms, based on segmental K-means and Iterative Conditional Modes, for the implementation of the extended model, is also presented. In applications to the recognition of segmented digits spoken over the telephone, the extended model achieved about 23% reduction in the recognition error rate, when compared to the performance of HMMs. IC991841.PDF (From Author) IC991841.PDF (Rasterized) TOP Probabilistic Classification of HMM States for Large Vocabulary Continuous Speech Recognition Authors: Xiaoqiang Luo, Frederick Jelinek, Page (NA) Paper number 2044 Abstract: In state-of-art large vocabulary continuous speech recognition (LVCSR) systems, HMM state-tying is often used to achieve good balance between the model resolution and robustness. In this paradigm, tied HMM states share a single set of parameters and are non-distinguishable. To capture the fine differences among tied HMM states, the probabilistic classification of HMM states (PCHMM) is proposed in this paper for LVCSR. In particular, a distribution from a HMM state to classes is introduced. It is shown that the state-to-class distribution can be estimated together with conventional HMM parameters within the EM framework. Compared with HMM state-tying, probabilistic classification of HMM states makes more efficient use of model parameters. It also makes the acoustic model more robust against the possible mismatch or variation between training and test data. The viability of this approach is verified by the significant reduction of word error rate (WER) on the Switchboard task. IC992044.PDF (From Author) IC992044.PDF (Rasterized) TOP The HDM: A Segmental Hidden Dynamic Model of Coarticulation Authors: Hywel B Richards, Dragon Systems UK (U.K.) John S Bridle, Dragon Systems UK (U.K.) Page (NA) Paper number 1930 Abstract: This paper introduces a new approach to acoustic-phonetic modelling, the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of a single vector target per phone in a hidden dynamic space in which speech trajectories are produced by a simple dynamic system. The hidden space is mapped to the surface acoustic representation via a non-linear mapping in the form of a multilayer perceptron (MLP). Algorithms are presented for training of all the parameters (target vectors and MLP weights) from segmented and labelled acoustic observations alone, with no special initialisation. The model captures the dynamic structure of speech, and appears to aid a speech recognition task based on the SwitchBoard corpus. IC991930.PDF (From Author) IC991930.PDF (Rasterized) TOP Maximum Likelihood Estimates for Exponential Type Density Families Authors: Sankar Basu, Charles A Micchelli, Peder A Olsen, Page (NA) Paper number 2066 Abstract: We consider a parametric family of density functions of the type exp(-\|x\|^(alpha)/2) for modeling acoustic feature vectors used in automatic recognition of speech. The parameter "alpha" is a measure of the impulsiveness as well as the nongaussian nature of the data. While previous work has focussed on estimating the mean and the variance of the data here we attempt to estimate the impulsiveness "alpha" from the data on a maximum likelihood basis. We show that there is a balance between "alpha" and the number of data points "N" that must be satisfied before maximum likelihood estimation is carried out. Numerical experiments are performed on multidimensional vectors obtained from speech data. IC992066.PDF (From Author) IC992066.PDF (Rasterized) TOP