ICASSP99 Utterance Verification/Acoustic Modeling

Utterance Verification/Acoustic Modeling
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Topic Independent Language Model for Key-Phrase Detection and Verification Authors: Tatsuya Kawahara, Shuji Doshita, Page (NA) Paper number 1687 Abstract: A topic independent lexical and language modeling for robust key-phrase detection and verification is presented. Instead of assuming a domain specific lexicon and language model, our model is designed to characterize filler phrases depending on the speaking-style, thus can be trained with large corpora of different topics but the same style. Mutual information criterion is used to select topic independent filler words and their N-gram model is used for verification of key-phrase hypotheses. A dialogue-style dependent filler model improves the key-phrase detection in different dialogue applications. A lecture-style dependent model is trained with transcriptions of various oral presentations by filtering out topic specific words. It performs much better verification of key-phrases uttered during lectures of different topics compared with the conventional syllable-based model and large vocabulary model. IC991687.PDF (From Author) IC991687.PDF (Rasterized) TOP A More Efficient And Optimal LLR For Decoding And Verification Authors: Kwok Leung Lam, Pascale Fung, Page (NA) Paper number 2359 Abstract: We propose a new confidence score for decoding and verification. Since the traditional log likelihood ratio (LLR) is borrowed from speaker verification technique, it may not be apropriate for decoding because we do not have a good modelling and definition of LLR for decoding/utterance verification. We have proposed a new formulation of LLR that can be used for decoding and verification task. Experimental results show that our proposed LLR can perform equally well compared with the result based on maximum likelihood in a decoding task. Also, we get an 5% improvement in decoding compared with traditional LLR. IC992359.PDF (From Author) IC992359.PDF (Rasterized) TOP Dynamic Classifier Combination in Hybrid Speech Recognition Systems using Utterance-Level Confidence Values Authors: Katrin Kirchhoff, Jeff A Bilmes, Page (NA) Paper number 2395 Abstract: A recent development in the hybrid HMM/ANN speech recognition paradigm is the use of several subword classifiers, each of which provides different information about the speech signal. Although the combining methods have obtained promising results, the strategies so far proposed have been relatively simple. In most cases frame-level subword unit probabilities are combined using an unweighted product or sum rule. In this paper, we argue and empirically demonstrate that the classifier combination approach can benefit from a dynamically weighted combination rule, where the weights are derived from higher-than-frame-level confidence values. IC992395.PDF (From Author) IC992395.PDF (Rasterized) TOP Utterance Verification Using Prosodic Information for Mandarin Telephone Speech Keyword Spotting Authors: Yeou-Jiunn Chen, Chung-Hsien Wu, Gwo-Lang Yan, Page (NA) Paper number 1366 Abstract: In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL's and 37 FINAL's in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMM's, 175 context-dependent prosodic HMM's, and five anti-prosodic HMM's, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5% false rejection, the proposed verification method resulted in 17.8% false alarm rate. Furthermore, this method was able to correctly reject 90.4% of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance. IC991366.PDF (From Author) IC991366.PDF (Rasterized) TOP Error Correction for Speaker-Independent Isolated Word Recognition through Likelihood Compensation Using Phonetic Bigram Authors: Hiroshi Matsuo, Masaaki Ishigame, Page (NA) Paper number 1609 Abstract: We propose an error correction technique for speaker-independent isolated word recognition by compensating for a word's likelihood. Likelihood is compensated for by likelihood calculated by a phonetic bigram. The phonetic bigram is a phoneme model expressing frame correlation within an utterance. A speaker-independent isolated word recognition experiment showed that our proposed technique reduces recognition error compared to conventional techniques. The proposed technique achieves performance almost equal that without speaker adaptation compared to the conventional phoneme model adapted using several words. IC991609.PDF (From Author) IC991609.PDF (Rasterized) TOP Advances In Confidence Measures For Large Vocabulary Authors: Andreas M Wendemuth, Philips Research Labs Aachen Germany (Germany) Georg Rose, Philips Research Labs Aachen Germany (Germany) J.G.A. Dolfing, Philips Research Labs Aachen Germany (Germany) Page (NA) Paper number 1664 Abstract: This paper adresses the correct choice and combination of confidence measures in large vocabulary speech recognition tasks. We classify single words within continuous as well as large vocabulary utterances into two categories: utterances within the vocabulary which are recognized correctly, and other utterances, namely misrecognized utterances or (less frequent) out-of-vocabulary (OOV). To this end, we investigate the confidence error rate (CER) for several classes of confidence measures and transformations. In particular, we employed data-independent and data-dependent measures. The transformations we investigated include mapping to single confidence measures and linear combinations of these measures. These combinations are computed by means of neural networks trained with Bayes-optimal, and with Gardner-Derrida-optimal criteria. Compared to a recognition system without confidence measures, the selection of (various combinations of) confidence measures, the selection of suitable neural network architectures and training methods, continuously improves the CER. IC991664.PDF (From Author) IC991664.PDF (Rasterized) TOP Hypothesis Dependent Threshold Setting for Improved Out-of-Vocabulary Data Rejection Authors: Denis Jouvet, France Télécom, CNET (France) Katarina Bartkova, France Télécom, CNET (France) Guy Mercier, France Télécom, CNET (France) Page (NA) Paper number 1663 Abstract: An efficient rejection procedure is necessary to reject out-of-vocabulary words and noise tokens that occur in voice activated vocal services. Garbage or filler models are very useful for such a task. However, a post-processing of the recognized hypothesis, based on a likelihood ratio statistic test, can refine the decision and improve performance. These tests can be applied either on acoustic parameters or on phonetic or prosodic parameters that are not taken into account by the HMM-based decoder. This paper focuses on the post-processing procedure and shows that making the likelihood ratio decision threshold dependent on the recognized hypothesis largely improves the efficiency of the rejection procedure. Models and anti-models are one of the key-points of such an approach. Their training and usage are also discussed, as well as the contextual modeling involved. Finally results are reported on a field database collected from a 2000-word directory task using various phonetic and prosodic parameters. IC991663.PDF (From Author) IC991663.PDF (Rasterized) TOP Buried Markov Models for Speech Recognition Authors: Jeff A Bilmes, Page (NA) Paper number 2105 Abstract: Good HMM-based speech recognition performance requires at most minimal inaccuracies to be introduced by HMM conditional independence assumptions. In this work, HMM conditional independence assumptions are relaxed in a principled way. For each hidden state value, additional dependencies are added between observation elements to increase both accuracy and discriminability. These additional dependencies are chosen according to natural statistical dependencies extant in training data that are not well modeled by an HMM. The result is called a buried Markov model (BMM) because the underlying Markov chain in an HMM is further hidden (buried) by specific cross-observation dependencies. Gaussian mixture HMMs are extended to represent BMM dependencies and new EM update equations are derived. On preliminary experiments with a large-vocabulary isolated-word speech database, BMMs are able to achieve an 11% improvement in WER with only a 9.5% increase in the number of parameters using a single state per mono-phone speech recognition system. IC992105.PDF (From Author) IC992105.PDF (Rasterized) TOP