Session: SPEECH-L3
Time: 9:30 - 11:30, Wednesday, May 9, 2001
Location: Room 151
Title: Acoustic Modeling 1
Chair: Mari Ostendorf

9:30, SPEECH-L3.1
AN EKF-BASED ALGORITHM FOR LEARNING STATISTICAL HIDDEN DYNAMIC MODEL PARAMETERS FOR PHONETIC RECOGNITION
R. TOGNERI, L. DENG
This paper presents a new parameter estimation algorithm based on the Extended Kalman Filter (EKF) for the recently proposed statistical coarticulatory Hidden Dynamic Model (HDM). We show how the EKF parameter estimation algorithm unifies and simplifies the estimation of both the state and parameter vectors. Experiments based on N-best rescoring demonstrate superior performance of the (context-independent) HDM over a triphone baseline HMM in the TIMIT phonetic recognition task. We also show that the HDM is capable of generating speech vectors close to those from the corresponding real data.

9:50, SPEECH-L3.2
CONTINUOUS SPEECH RECOGNITION USING A HIERARCHICAL BAYESIAN MODEL
F. MOURIA-BEJI
This work proposes a stochastic model for continuous speech recognition that provides automatic segmentation of spoken utterances into phonemes and facilitates the quantitative assessment of uncertainty associated with the identified utterance features. The model is specified hierarchically within the Bayesian paradigm. At the lowest level of the hierarchy, a Gibbs distribution is used to specify a probability distribution on all the possible partitions of the utterance. The number of partitioning elements which are phonemes is not specified a priori. At higher level in the hierarchical specification, random variables representing phoneme durations and acoustic vector values are associated with each phoneme and frame. Estimation of the posterior distribution is done using Gibbs sampler scheme.

10:10, SPEECH-L3.3
INDICATOR VARIABLE DEPENDENT OUTPUT PROBABILITY MODELLING VIA CONTINUOUS POSTERIOR FUNCTIONS
A. TUERK, S. YOUNG
This paper investigates the problem of inserting an additional hidden variable into a standard HMM. It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable. The posteriors are modelled by softmax functions with polynomial exponents and an efficient method is developed for reestimating their parameters. After analysing a two dimensional reestimation example on artificial data, the proposed HMM is evaluated on the 1997 Broadcast News task with a particular focus on spontaneous speech. To derive a good indicator variable for this purpose, classification experiments are carried out on fast and slow classes of phones on the 1997 Broadcast News training data. Finally, recognition experiments on the test set of this task show that the proposed model gives an improvement over a standard HMM with a comparable number of parameters.

10:30, SPEECH-L3.4
INVESTIGATING LIGHTLY SUPERVISED ACOUSTIC MODEL TRAINING
L. LAMEL, J. GAUVAIN, G. ADDA
The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe broadcast audio data with a word error of about 20%. However, acoustic model development for the recognizers requires large corpora of manually transcribed training data. Obtaining such data is both time-consuming and expensive, requiring trained human annotators with substantial amounts of supervision. In this paper we describe some recent experiments using different levels of supervision for acoustic model training in order to reduce the system development cost. The experiments have been carried out using the DARPA TDT-2 corpus (also used in the SDR99 and SDR00 evaluations). Our experiments demonstrate that light supervision is sufficient for acoustic model development, drastically reducing the development cost.

10:50, SPEECH-L3.5
MULTIPLE LINEAR TRANSFORMS
N. GOEL, R. GOPINATH
In the past several years, Linear Discriminant Analysis (LDA) is being replaced by Heteroscedastic Discriminant Analysis (HDA), to improve the performance of a recognition system that uses a mixture of diagonal covariance prototypes to model the data. A specific version HDA, popularly known as Maximum Likelihood Linear Transform (MLLT) is also used, on the features finally obtained. However the performance of such systems is not as good as could be obtained for a corresponding system that uses full covariance matrices. We propose the method of Multiple Linear Transforms (MLT), that bridges this gap in performance, while maintaining the speed efficiency of a diagonal covariance system. In other words, this technique improves the performance of a diagonal covariance system, over what could be obtained from HDA, or MLLT.

11:10, SPEECH-L3.6
RELAX FRAME INDEPENDENCE ASSUMPTION FOR STANDARD HMMS BY STATE DEPENDENT AUTO-REGRESSIVE FEATURE MODELS
Y. JIA, J. LI
In this paper, we propose a new type of frame-based hidden Markov models (HMMs), in which a sequence of observations are generated using state-dependent auto-regressive feature models. Based on this correlation model, it can be proved that expressing the probability of a sequence of observations as a product of probabilities of decorrelated individual observations doesn't require the assumption of frame independence. Under the maximum likelihood (ML) criteria, we also derived re-estimation formulae for the parameters (mean vectors, covariance matrix, and diagonal regression matrice) of the new HMMs using an Expectation Maximization (EM) algorithm. From the formulae, it's interesting to see that the new HMMs have extended the standard HMMs by relaxing the frame independence limitation. Initial experiment conducted on WSJ20K task shows an encouraging performance improvement with only 117 additional parameters in all.