Acoustic Modeling I

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Using A Large Vocabulary Continuous Speech Recognizer For A Constrained Domain With Limited Training

Authors:

Man-Hung Siu,
Michael Jonas,
Herbert Gish,

Page (NA) Paper number 2472

Abstract:

How to train a speech recognizer with limited amount of training data is of interest to many researcher. In this paper, we describe how we use BBN's Byblos large vocabulary continuous speech recognition (LVCSR) system for the military air-traffic-control domain where we have less than an hour of training data. We investigate three ways to deal with the limited training data: 1) re-configure the LVCSR system to use fewer parameters, 2) incorporate out-of-domain data, and, 3) use pragmatic information, such as speaker identity and controller function to improve recognition performance. We compare the LVCSR performance to that of the tied-mixture recognizer that is designed for limited vocabulary. We show that the reconfigured LVCSR system out-performs the tied-mixture system by 10% in absolute word error rate. When enough data is available per speaker, vocal tract length normalization and supervised adaptation techniques can further improve performance by 6% even for this domain with limited training. We also show that the use of out-of-domain data and pragmatic information, if available, can each further improve performance by 1-3%.

IC992472.PDF (From Author) IC992472.PDF (Rasterized)

TOP


Initial Evaluation of Hidden Dynamic Models on Conversational Speech

Authors:

Joseph Picone,
Sandi Pike,
Roland Reagan,
Terri Kamm,
John S Bridle, Dragon Systems U.K. (U.K.)
Li Deng,
Jeff Ma,
Hywel B Richards, Dragon Systems U.K. (U.K.)
Mike Schuster,

Page (NA) Paper number 2339

Abstract:

Conversational speech recognition is a challenging problem primarily because speakers rarely fully articulate sounds. A successful speech recognition approach must infer intended spectral targets from the speech data, or develop a method of dealing with large variances in the data. Hidden Dynamic Models (HDMs) attempt to automatically learn such targets in a hidden feature space using models that integrate linguistic information with constrained temporal trajectory models. HDMs are a radical departure from conventional hidden Markov models (HMMs), which simply account for variation in the observed data. In this paper, we present an initial evaluation of such models on a conversational speech recognition task involving a subset of the SWITCHBOARD corpus. We show that in an N-Best rescoring paradigm, HDMs are capable of delivering performance competitive with HMMs.

IC992339.PDF (From Author) IC992339.PDF (Rasterized)

TOP


Convolutional Density Estimation in Hidden Markov Models for Speech Recognition

Authors:

Spyros Matsoukas,
George Zavaliagkos,

Page (NA) Paper number 2379

Abstract:

In continuous density Hidden Markov Models (HMMs) for speech recognition, the probability density function (pdf) for each state is usually expressed as a mixture of Gaussians. In this paper, we present a model in which the pdf is expressed as the convolution of two densities. We focus on the special case where one of the convolved densities is a M-Gaussian mixture and the other is a mixture of N impulses. We present the reestimation formulae for the parameters of the MxN convolutional model, and suggest two ways for initializing them, the residual K-Means approach, and the deconvolution from a standard HMM with MN Gaussians per state using a genetic algorithm to search for the optimal assignment of Gaussians. Both methods result in a compact representation that requires only O(M + N) storage space for the model parameters, and O(MN) time for training and decoding. We explain how the decoding time can be reduced to O(M + kN), where k < M. Finally, results are shown on the 1996 Hub-4 Development test, demonstrating that a 32x2 convolutional model can achieve performance comparable to that of a standard 64-Gaussian per state model.

IC992379.PDF (From Author) IC992379.PDF (Rasterized)

TOP


Automatic Clustering And Generation Of Contextual Questions For Tied States In Hidden Markov Models

Authors:

Rita Singh,
Bhiksha Raj,
Richard M Stern,

Page (NA) Paper number 2487

Abstract:

Most current automatic speech recognition systems based on HMMs cluster or tie together subsets of the subword units with which speech is represented. This tying improves recognition accuracy when systems are trained with limited data, and is performed by classifying the sub-phonetic units using a series of binary tests based on speech production, called "linguistic questions". This paper describes a new method for automatically determining the best combinations of subword units to form these questions. The hybrid algorithm proposed clusters state distributions of context-independent phones to obtain questions for triphonetic contexts. Experiments confirm that the questions thus generated can replace manually generated questions and can provide improved recognition accuracy. Automatic generation of questions has the additional important advantage of extensibility to languages for which the phonetic structure is not well understood by the system designer, and can be effectively used in situations where the subword units are not phonetically motivated.

IC992487.PDF (From Author) IC992487.PDF (Rasterized)

TOP


Partly Hidden Markov Model and its Application to Speech Recognition

Authors:

Tetsunori Kobayashi,
Junko Furuyama,
Ken Masumitsu,

Page (NA) Paper number 2323

Abstract:

A new pattern matching method, Partly Hidden Markov Model, is proposed and applied to speech recognition. Hidden Markov Model, which is widely used for speech recognition, can deal with only piecewise stationary stochastic process. We solved this problem by introducing the modified second order Markov Model, in which the first state is hidden and the second one is observable. In this model, not only the feature parameter observations but also the state transitions are dependent on the previous feature observation. Therefore, even the compricated transient can be modeled precisely. Some simulational experiments showed the high potential of the proposed model. As the results of word recognition test, the error rate was reduced by 39% compared with normal HMM.

IC992323.PDF (From Author) IC992323.PDF (Rasterized)

TOP


Hidden Markov Models With Divergence Based Vector Quantized Variances

Authors:

Jae H Kim,
Raziel Haimi-Cohen,
Frank K Soong,

Page (NA) Paper number 2337

Abstract:

This paper describes a method to significantly reduce the complexity of continuous density HMM with only a small degradation in performance. The proposed method is noise-robust and may perform even better than the standard algorithm if training and testing noise conditions are not matched. The method is based on approximating the variance vectors of the Gaussian kernels by a vector quantization (VQ) codebook of a small size. The quantization of the variance vectors is done using an information theoretic distortion measure. Closed form expressions are given for the computation of the VQ codebook and the superiority of the proposed distortion measure over the Euclidean distance is demonstrated. The effectiveness of the proposed method is shown using the connected TI digits database and a noisy version of it. For the connected TI digit database, the proposed method shows that by quantizing the variance to 16 levels we can maintain recognition performance within 1% degradation of the original VR system. In comparison, with Euclidean distortion, a size 256 codebook is needed for a similar error rate.

IC992337.PDF (From Author) IC992337.PDF (Rasterized)

TOP


HMM Training Based on Quality Measurement

Authors:

Yuqing Gao,
Ea-Ee Jan,
Mukund Padmanabhan,
Michael Picheny,

Page (NA) Paper number 2483

Abstract:

Two discriminant measures for HMM states to improve effectiveness on HMM training are presented in this paper. In HMM based speech recognition, the context-dependent states are usually modeled by Gaussian mixture distributions. In general, the number of Gaussian mixtures for each state is fixed or proportional to the amount of training data. From our study, some of the states are ``non-aggressive'' compared to others, and a higher acoustic resolution is required for them. Two methods are presented in this paper to determine those non-aggressive states. The first approach uses the recognition accuracy of the states and the second method is based on a rank distribution of states. Baseline systems, trained by fixed number of Gaussian mixtures for each state, with 33K and 120K Gaussians yield 14.57% and 13.04% word error rates, respectively. Using our approaches, a 38K Gaussians was constructed that reduces the error rate to 13.95%. The average ranks of non-aggressive states in rank lists of testing data were also seems to dramatically improve compared to the baseline systems.

IC992483.PDF (From Author) IC992483.PDF (Rasterized)

TOP


Prosodic Word Boundary Detection Using Statistical Modeling of Moraic Fundamental Frequency Contours and Its Use for Continuous Speech Recognition

Authors:

Koji Iwano,
Keikichi Hirose,

Page (NA) Paper number 2237

Abstract:

A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F0) contours, formerly proposed by the authors. In the developed method, F0 contours of prosodic words were modeled separately according to the accent types. An input utterance was matched against the models and was divided into constituent prosodic words. By doing so, prosodic word boundaries can be obtained. The method was first applied to the boundary detection experiments of ATR continuous speech corpus. With mora boundary locations given in the corpus, total detection rate reached 91.5%. Then the method was integrated into a continuous speech recognition scheme with unlimited vocabulary. A few percentage improvement was observed in mora recognition for the above corpus. Although all the experiments done in closed conditions due to the corpus availability, the results indicated the usefulness of the proposed method.

IC992237.PDF (From Author) IC992237.PDF (Rasterized)

TOP