ICASSP99 Acoustic Modeling I

Acoustic Modeling I
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Using A Large Vocabulary Continuous Speech Recognizer For A Constrained Domain With Limited Training Authors: Man-Hung Siu, Michael Jonas, Herbert Gish, Page (NA) Paper number 2472 Abstract: How to train a speech recognizer with limited amount of training data is of interest to many researcher. In this paper, we describe how we use BBN's Byblos large vocabulary continuous speech recognition (LVCSR) system for the military air-traffic-control domain where we have less than an hour of training data. We investigate three ways to deal with the limited training data: 1) re-configure the LVCSR system to use fewer parameters, 2) incorporate out-of-domain data, and, 3) use pragmatic information, such as speaker identity and controller function to improve recognition performance. We compare the LVCSR performance to that of the tied-mixture recognizer that is designed for limited vocabulary. We show that the reconfigured LVCSR system out-performs the tied-mixture system by 10% in absolute word error rate. When enough data is available per speaker, vocal tract length normalization and supervised adaptation techniques can further improve performance by 6% even for this domain with limited training. We also show that the use of out-of-domain data and pragmatic information, if available, can each further improve performance by 1-3%. IC992472.PDF (From Author) IC992472.PDF (Rasterized) TOP Initial Evaluation of Hidden Dynamic Models on Conversational Speech Authors: Joseph Picone, Sandi Pike, Roland Reagan, Terri Kamm, John S Bridle, Dragon Systems U.K. (U.K.) Li Deng, Jeff Ma, Hywel B Richards, Dragon Systems U.K. (U.K.) Mike Schuster, Page (NA) Paper number 2339 Abstract: Conversational speech recognition is a challenging problem primarily because speakers rarely fully articulate sounds. A successful speech recognition approach must infer intended spectral targets from the speech data, or develop a method of dealing with large variances in the data. Hidden Dynamic Models (HDMs) attempt to automatically learn such targets in a hidden feature space using models that integrate linguistic information with constrained temporal trajectory models. HDMs are a radical departure from conventional hidden Markov models (HMMs), which simply account for variation in the observed data. In this paper, we present an initial evaluation of such models on a conversational speech recognition task involving a subset of the SWITCHBOARD corpus. We show that in an N-Best rescoring paradigm, HDMs are capable of delivering performance competitive with HMMs. IC992339.PDF (From Author) IC992339.PDF (Rasterized) TOP Convolutional Density Estimation in Hidden Markov Models for Speech Recognition Authors: Spyros Matsoukas, George Zavaliagkos, Page (NA) Paper number 2379 Abstract: In continuous density Hidden Markov Models (HMMs) for speech recognition, the probability density function (pdf) for each state is usually expressed as a mixture of Gaussians. In this paper, we present a model in which the pdf is expressed as the convolution of two densities. We focus on the special case where one of the convolved densities is a M-Gaussian mixture and the other is a mixture of N impulses. We present the reestimation formulae for the parameters of the MxN convolutional model, and suggest two ways for initializing them, the residual K-Means approach, and the deconvolution from a standard HMM with MN Gaussians per state using a genetic algorithm to search for the optimal assignment of Gaussians. Both methods result in a compact representation that requires only O(M + N) storage space for the model parameters, and O(MN) time for training and decoding. We explain how the decoding time can be reduced to O(M + kN), where k < M. Finally, results are shown on the 1996 Hub-4 Development test, demonstrating that a 32x2 convolutional model can achieve performance comparable to that of a standard 64-Gaussian per state model. IC992379.PDF (From Author) IC992379.PDF (Rasterized) TOP Automatic Clustering And Generation Of Contextual Questions For Tied States In Hidden Markov Models Authors: Rita Singh, Bhiksha Raj, Richard M Stern, Page (NA) Paper number 2487 Abstract: Most current automatic speech recognition systems based on HMMs cluster or tie together subsets of the subword units with which speech is represented. This tying improves recognition accuracy when systems are trained with limited data, and is performed by classifying the sub-phonetic units using a series of binary tests based on speech production, called "linguistic questions". This paper describes a new method for automatically determining the best combinations of subword units to form these questions. The hybrid algorithm proposed clusters state distributions of context-independent phones to obtain questions for triphonetic contexts. Experiments confirm that the questions thus generated can replace manually generated questions and can provide improved recognition accuracy. Automatic generation of questions has the additional important advantage of extensibility to languages for which the phonetic structure is not well understood by the system designer, and can be effectively used in situations where the subword units are not phonetically motivated. IC992487.PDF (From Author) IC992487.PDF (Rasterized) TOP Partly Hidden Markov Model and its Application to Speech Recognition Authors: Tetsunori Kobayashi, Junko Furuyama, Ken Masumitsu, Page (NA) Paper number 2323 Abstract: A new pattern matching method, Partly Hidden Markov Model, is proposed and applied to speech recognition. Hidden Markov Model, which is widely used for speech recognition, can deal with only piecewise stationary stochastic process. We solved this problem by introducing the modified second order Markov Model, in which the first state is hidden and the second one is observable. In this model, not only the feature parameter observations but also the state transitions are dependent on the previous feature observation. Therefore, even the compricated transient can be modeled precisely. Some simulational experiments showed the high potential of the proposed model. As the results of word recognition test, the error rate was reduced by 39% compared with normal HMM. IC992323.PDF (From Author) IC992323.PDF (Rasterized) TOP Hidden Markov Models With Divergence Based Vector Quantized Variances Authors: Jae H Kim, Raziel Haimi-Cohen, Frank K Soong, Page (NA) Paper number 2337 Abstract: This paper describes a method to significantly reduce the complexity of continuous density HMM with only a small degradation in performance. The proposed method is noise-robust and may perform even better than the standard algorithm if training and testing noise conditions are not matched. The method is based on approximating the variance vectors of the Gaussian kernels by a vector quantization (VQ) codebook of a small size. The quantization of the variance vectors is done using an information theoretic distortion measure. Closed form expressions are given for the computation of the VQ codebook and the superiority of the proposed distortion measure over the Euclidean distance is demonstrated. The effectiveness of the proposed method is shown using the connected TI digits database and a noisy version of it. For the connected TI digit database, the proposed method shows that by quantizing the variance to 16 levels we can maintain recognition performance within 1% degradation of the original VR system. In comparison, with Euclidean distortion, a size 256 codebook is needed for a similar error rate. IC992337.PDF (From Author) IC992337.PDF (Rasterized) TOP HMM Training Based on Quality Measurement Authors: Yuqing Gao, Ea-Ee Jan, Mukund Padmanabhan, Michael Picheny, Page (NA) Paper number 2483 Abstract: Two discriminant measures for HMM states to improve effectiveness on HMM training are presented in this paper. In HMM based speech recognition, the context-dependent states are usually modeled by Gaussian mixture distributions. In general, the number of Gaussian mixtures for each state is fixed or proportional to the amount of training data. From our study, some of the states are ``non-aggressive'' compared to others, and a higher acoustic resolution is required for them. Two methods are presented in this paper to determine those non-aggressive states. The first approach uses the recognition accuracy of the states and the second method is based on a rank distribution of states. Baseline systems, trained by fixed number of Gaussian mixtures for each state, with 33K and 120K Gaussians yield 14.57% and 13.04% word error rates, respectively. Using our approaches, a 38K Gaussians was constructed that reduces the error rate to 13.95%. The average ranks of non-aggressive states in rank lists of testing data were also seems to dramatically improve compared to the baseline systems. IC992483.PDF (From Author) IC992483.PDF (Rasterized) TOP Prosodic Word Boundary Detection Using Statistical Modeling of Moraic Fundamental Frequency Contours and Its Use for Continuous Speech Recognition Authors: Koji Iwano, Keikichi Hirose, Page (NA) Paper number 2237 Abstract: A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F0) contours, formerly proposed by the authors. In the developed method, F0 contours of prosodic words were modeled separately according to the accent types. An input utterance was matched against the models and was divided into constituent prosodic words. By doing so, prosodic word boundaries can be obtained. The method was first applied to the boundary detection experiments of ATR continuous speech corpus. With mora boundary locations given in the corpus, total detection rate reached 91.5%. Then the method was integrated into a continuous speech recognition scheme with unlimited vocabulary. A few percentage improvement was observed in mora recognition for the above corpus. Although all the experiments done in closed conditions due to the corpus availability, the results indicated the usefulness of the proposed method. IC992237.PDF (From Author) IC992237.PDF (Rasterized) TOP