ICASSP99 Acoustic Modeling III

Acoustic Modeling III
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Connected Digit Recognition Using Short and Long Duration Models Authors: Cristina Chesta, Pietro Laface, Franco Ravera, Page (NA) Paper number 1465 Abstract: In this paper we show that accurate HMMs for connected word recognition can be obtained without context dependent modeling and discriminative training. We train two HMMs for each word that have the same, standard, left to right topology with the possibility of skipping one state, but each model has a different number of states, automatically selected. The two models account for different speaking rates that occur not only in different utterances of the speakers, but also within a connected word utterance of the same speaker. This simple modeling technique has been applied to connected digit recognition using the adult speaker portion of the TI/NIST corpus giving the best results reported so far for this database. It has also been tested on telephone speech using long sequences of Italian digits (credit card numbers), giving better results with respect to classical models with a larger number of densities. IC991465.PDF (From Author) IC991465.PDF (Rasterized) TOP Discriminative Training Via Linear Programming Authors: Kishore A Papineni, Page (NA) Paper number 2204 Abstract: This paper presents a linear programming approach to discriminative training. We first define a measure of discrimination of an arbitrary conditional probability model on a set of labeled training data. We consider maximizing discrimination on a parametric family of exponential models that arises naturally in the maximum entropy framework. We show that this optimization problem is globally convex in R^n, and is moreover piece-wise linear on R^n. We propose a solution that involves solving a series of linear programming problems. We provide a characterization of global optimizers. We compare this framework with those of minimum classification error and maximum entropy. IC992204.PDF (From Author) IC992204.PDF (Rasterized) TOP Refining Tree-Based State Clustering by Means of Formal Concept Analysis, Balanced Decision Trees and Automatically Generated Model-Sets Authors: Daniel Willett, Christoph Neukirchen, Jörg Rottland, Gerhard Rigoll, Page (NA) Paper number 1633 Abstract: Decision tree-based state clustering has emerged in recent years as the most popular approach for clustering the states of context dependent hidden Markov model based speech recognizers. The application of sets of phones, mainly phonetically motivated, that limit the possible clusters, results in a reasonably good modeling of unseen phones while it still enables to model specific phones very precisely whenever this is necessary and enough training data is available. Formal Concept Analysis, a young mathematical discipline, provides means for the treatment of sets and sets of sets that are well suited for further improving tree-based state clustering. The possible refinements are outlined and evaluated in this paper. The major merit is the proposal of procedures for the adaptation of the number of sets used for clustering to the amount of available training data, and of a method that generates suitable sets automatically without the incorporation of additional knowledge. IC991633.PDF (From Author) IC991633.PDF (Rasterized) TOP Efficient Speech Recognition Using Subvector Quantization And Discrete-Mixture HMMs Authors: Stavros Tsakalidis, Vassilis Digalakis, Leonardo G Neumeyer, Page (NA) Paper number 2012 Abstract: This paper introduces a new form of observation distributions for hidden Markov models (HMMs), combining subvector quantization and mixtures of discrete distributions. We present efficient training and decoding algorithms for the discrete-mixture HMMs (DMHMMs). Our experimental results in the air-travel information domain show that the high-level of recognition accuracy of continuous mixture-density HMMs (CDHMMs) can be maintained at significantly faster decoding speeds. Moreover, we show that when the same number of mixture components is used in DMHMMs and CDHMMs, the new models exhibit superior recognition performance. IC992012.PDF (From Author) IC992012.PDF (Rasterized) TOP A Unified Approach Of Incorporating General Features In Decision Tree Based Acoustic Modeling Authors: Wolfgang Reichl, Wu Chou, Page (NA) Paper number 2377 Abstract: In this paper, a unified maximum likelihood framework of incorporating phonetic and non-phonetic features in decision tree based acoustic modeling is proposed. Unlike phonetic features, non-phonetic features in this context are those features, which cannot be derived from the phoneme identities. Although non-phonetic features are used in speech recognition, they are often treated separately and based on various heuristics. In our approach, non-phonetic features are included as additional tags to the decision tree clustering. Moreover, the proposed tagged decision tree is based on the full training data, and therefore, it alleviates the problem of training data depletion in building specific feature dependent acoustic models. Experimental results indicate that up to 10% word error rate reduction can be achieved in a large vocabulary (Wall Street Journal) speech recognition task based on the proposed approach. IC992377.PDF (From Author) IC992377.PDF (Rasterized) TOP Irrelevant Variability Normalization in Learning HMM State Tying From Data Based on Phonetic Decision-Tree Authors: Qiang Huo, Department of Computer Science and Information Systems, The University of Hong Kong, Pokfulam Road, Hong Kong (Hong Kong) Bin Ma, Department of Computer Science and Information Systems, The University of Hong Kong, Pokfulam Road, Hong Kong (Hong Kong) Page (NA) Paper number 1825 Abstract: We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-tree based hidden Markov model (HMM) state tying. We present a new method that integrates a linear-transformation based normalization mechanism into the decision-tree construction process to make the learned structure have a better modeling capability and generalizability. The viability and efficacy of the proposed method are confirmed in a series of experiments for continuous speech recognition of Mandarin Chinese. IC991825.PDF (From Author) IC991825.PDF (Rasterized) TOP Discriminative Spectral-Temporal Multi-Resolution Features For Speech Recognition Authors: Philip McMahon, The Queen's University of Belfast, Northern Ireland (Ireland) Naomi Harte, The Queen's University of Belfast, Northern Ireland (Ireland) Saeed Vaseghi, The Queen's University of Belfast, Northern Ireland (Ireland) Paul McCourt, The Queen's University of Belfast, Northern Ireland (Ireland) Page (NA) Paper number 1649 Abstract: Multi-resolution features, which are based on the premise that there may be more cues for phonetic discrimination in a given sub-band than in another, have been shown to outperform the standard MFCC feature set for both classification and recognition tasks on the TIMIT database [5]. This paper presents an investigation into possible strategies to extend these ideas from the spectral domain into both the spectral and temporal domains. Experimental work on the integration of segmental models, which are better at capturing the longer term phonetic correlation of a phonetic unit, into the discriminative multi-resolution framework is presented. Results are presented which show that including this supplementary temporal information offers an improvement performance for the phoneme classification task over the standard multi-resolution MFCC feature set with time derivatives appended. Possible strategies for the extension of theses techniques into the area of continuous speech recognition are discussed. IC991649.PDF (From Author) IC991649.PDF (Rasterized) TOP On the use of Support Vector Machines for Phonetic Classification Authors: Philip R Clarkson, Pedro J Moreno, Page (NA) Paper number 2104 Abstract: Support Vector Machines (SVMs) represent a new approach to pattern classification which has recently attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of Structural Risk Minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other non-linear classifiers such as artificial neural networks and k-nearest neighbors. This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition. We present results on several standard vowel and phonetic classification tasks and show better performance than Gaussian mixture classifiers. We also present an analysis of the difficulties we foresee in applying SVMs to continuous speech recognition problems. IC992104.PDF (From Author) IC992104.PDF (Rasterized) TOP