SpacerHome

Spacer
Mirror Sites
Spacer
General Information
Spacer
Confernce Schedule
Spacer
Technical Program
Spacer
     Plenary Sessions
Spacer
     Special Sessions
Spacer
     Expert Summaries
Spacer
     Tutorials
Spacer
     Industry Technology Tracks
Spacer
     Technical Sessions
    
By Date
    March 16
    March 17
    March 18
    March 19
    
By Category
    AE     COMM
    DISPS     DSPE
    ESS     IMDSP
    ITT     MMSP
    NNSP     SAM
    SP     SPEC
    SPTM
    
By Author
        A    B    C    D   
        E    F    G    H   
        I    J    K    L   
        M    N    O    P   
        Q    R    S    T   
        U    V    W    X   
        Y    Z   
Spacer
Tutorials
Spacer
Industry Technology Tracks
Spacer
Exhibits
Spacer
Sponsors
Spacer
Registration
Spacer
Coming to Phoenix
Spacer
Call for Papers
Spacer
Author's Kit
Spacer
On-line Review
Spacer
Future Conferences
Spacer
Help

Abstract: Session SP-18

Conference Logo

SP-18.1  

PDF File of Paper Manuscript
Connected Digit Recognition Using Short and Long Duration Models
Cristina Chesta, Pietro Laface (Politecnico di Torino), Franco Ravera (CSELT, Torino)

In this paper we show that accurate HMMs for connected word recognition can be obtained without context dependent modeling and discriminative training. We train two HMMs for each word that have the same, standard, left to right topology with the possibility of skipping one state, but each model has a different number of states, automatically selected. The two models account for different speaking rates that occur not only in different utterances of the speakers, but also within a connected word utterance of the same speaker. This simple modeling technique has been applied to connected digit recognition using the adult speaker portion of the TI/NIST corpus giving the best results reported so far for this database. It has also been tested on telephone speech using long sequences of Italian digits (credit card numbers), giving better results with respect to classical models with a larger number of densities.


SP-18.2  

PDF File of Paper Manuscript
Discriminative Training Via Linear Programming
Kishore A Papineni (IBM T. J. Watson Research Center)

This paper presents a linear programming approach to discriminative training. We first define a measure of discrimination of an arbitrary conditional probability model on a set of labeled training data. We consider maximizing discrimination on a parametric family of exponential models that arises naturally in the maximum entropy framework. We show that this optimization problem is globally convex in $R^n$, and is moreover piece-wise linear on $R^n$. We propose a solution that involves solving a series of linear programming problems. We provide a characterization of global optimizers. We compare this framework with those of minimum classification error and maximum entropy.


SP-18.3  

PDF File of Paper Manuscript
Refining Tree-Based Clustering by Means of Formal Concept Analysis, Balanced Decision Trees and Automatically Generated Model-Sets
Daniel Willett, Christoph Neukirchen, Jörg Rottland, Gerhard Rigoll (Gerhard-Mercator-University Duisburg)

Decision tree-based state clustering has emerged in recent years as the most popular approach for clustering the states of context dependent hidden Markov model based speech recognizers. The application of sets of phones, mainly phonetically motivated, that limit the possible clusters, results in a reasonably good modeling of unseen phones while it still enables to model specific phones very precisely whenever this is necessary and enough training data is available. Formal Concept Analysis, a young mathematical discipline, provides means for the treatment of sets and sets of sets that are well suited for further improving tree-based state clustering. The possible refinements are outlined and evaluated in this paper. The major merit is the proposal of procedures for the adaptation of the number of sets used for clustering to the amount of available training data, and of a method that generates suitable sets automatically without the incorporation of additional knowledge.


SP-18.4  

PDF File of Paper Manuscript
EFFICIENT SPEECH RECOGNITION USING SUBVECTOR QUANTIZATION AND DISCRETE-MIXTURE HMMS
Stavros Tsakalidis (Technical University of Crete), Vassilios Digalakis (Technical University of Crete / SRI International), Leonardo G Neumeyer (SRI International)

This paper introduces a new form of observation distributions for hidden Markov models (HMMs), combining subvector quantization and mixtures of discrete distributions. We present efficient training and decoding algorithms for the discrete-mixture HMMs (DMHMMs). Our experimental results in the air-travel information domain show that the high-level of recognition accuracy of continuous mixture-density HMMs (CDHMMs) can be maintained at significantly faster decoding speeds. Moreover, we show that when the same number of mixture components is used in DMHMMs and CDHMMs, the new models exhibit superior recognition performance.


SP-18.5  

PDF File of Paper Manuscript
A UNIFIED APPROACH OF INCORPORATING GENERAL FEATURES IN DECISION TREE BASED ACOUSTIC MODELING
Wolfgang Reichl, Wu Chou (Bell Laboratories, Lucent Technologies)

In this paper, a unified maximum likelihood framework of incorporating phonetic and non-phonetic features in decision tree based acoustic modeling is proposed. Unlike phonetic features, non-phonetic features in this context are those features, which cannot be derived from the phoneme identities. Although non-phonetic features are used in speech recognition, they are often treated separately and based on various heuristics. In our approach, non-phonetic features are included as additional tags to the decision tree clustering. Moreover, the proposed tagged decision tree is based on the full training data, and therefore, it alleviates the problem of training data depletion in building specific feature dependent acoustic models. Experimental results indicate that up to 10% word error rate reduction can be achieved in a large vocabulary (Wall Street Journal) speech recognition task based on the proposed approach.


SP-18.6  

PDF File of Paper Manuscript
Irrelevant Variability Normalization in Learning HMM State Tying From Data Based on Phonetic Decision-Tree
Qiang Huo, Bin Ma (Department of Computer Science and Information Systems, The University of Hong Kong, Pokfulam Road, Hong Kong)

We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-tree based hidden Markov model (HMM) state tying. We present a new method that integrates a linear-transformation based normalization mechanism into the decision-tree construction process to make the learned structure have a better modeling capability and generalizability. The viability and efficacy of the proposed method are confirmed in a series of experiments for continuous speech recognition of Mandarin Chinese.


SP-18.7  

PDF File of Paper Manuscript
DISCRIMINATIVE SPECTRAL-TEMPORAL MULTI-RESOLUTION FEATURES FOR SPEECH RECOGNITION
Philip McMahon, Naomi Harte, Saeed Vaseghi, Paul McCourt (The Queen’s University of Belfast, Northern Ireland)

Multi-resolution features, which are based on the premise that there may be more cues for phonetic discrimination in a given sub-band than in another, have been shown to outperform the standard MFCC feature set for both classification and recognition tasks on the TIMIT database [5]. This paper presents an investigation into possible strategies to extend these ideas from the spectral domain into both the spectral and temporal domains. Experimental work on the integration of segmental models, which are better at capturing the longer term phonetic correlation of a phonetic unit, into the discriminative multi-resolution framework is presented. Results are presented which show that including this supplementary temporal information offers an improvement performance for the phoneme classification task over the standard multi-resolution MFCC feature set with time derivatives appended. Possible strategies for the extension of theses techniques into the area of continuous speech recognition are discussed.


SP-18.8  

PDF File of Paper Manuscript
On the use of Support Vector Machines for Phonetic Classification
Philip R Clarkson (Cambridge University Engineering Department.), Pedro J Moreno (Compaq Computer Corporation, Cambridge Research Lab)

Support Vector Machines (SVMs) represent a new approach to pattern classification which has recently attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of Structural Risk Minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other non-linear classifiers such as artificial neural networks and $k$-nearest neighbors. This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition. We present results on several standard vowel and phonetic classification tasks and show better performance than Gaussian mixture classifiers. We also present an analysis of the difficulties we foresee in applying SVMs to continuous speech recognition problems.


SP-17 SP-19 >


Last Update:  February 4, 1999         Ingo Höntsch
Return to Top of Page