Utterance Verification/Acoustic Modeling

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Topic Independent Language Model for Key-Phrase Detection and Verification

Authors:

Tatsuya Kawahara,
Shuji Doshita,

Page (NA) Paper number 1687

Abstract:

A topic independent lexical and language modeling for robust key-phrase detection and verification is presented. Instead of assuming a domain specific lexicon and language model, our model is designed to characterize filler phrases depending on the speaking-style, thus can be trained with large corpora of different topics but the same style. Mutual information criterion is used to select topic independent filler words and their N-gram model is used for verification of key-phrase hypotheses. A dialogue-style dependent filler model improves the key-phrase detection in different dialogue applications. A lecture-style dependent model is trained with transcriptions of various oral presentations by filtering out topic specific words. It performs much better verification of key-phrases uttered during lectures of different topics compared with the conventional syllable-based model and large vocabulary model.

IC991687.PDF (From Author) IC991687.PDF (Rasterized)

TOP


A More Efficient And Optimal LLR For Decoding And Verification

Authors:

Kwok Leung Lam,
Pascale Fung,

Page (NA) Paper number 2359

Abstract:

We propose a new confidence score for decoding and verification. Since the traditional log likelihood ratio (LLR) is borrowed from speaker verification technique, it may not be apropriate for decoding because we do not have a good modelling and definition of LLR for decoding/utterance verification. We have proposed a new formulation of LLR that can be used for decoding and verification task. Experimental results show that our proposed LLR can perform equally well compared with the result based on maximum likelihood in a decoding task. Also, we get an 5% improvement in decoding compared with traditional LLR.

IC992359.PDF (From Author) IC992359.PDF (Rasterized)

TOP


Dynamic Classifier Combination in Hybrid Speech Recognition Systems using Utterance-Level Confidence Values

Authors:

Katrin Kirchhoff,
Jeff A Bilmes,

Page (NA) Paper number 2395

Abstract:

A recent development in the hybrid HMM/ANN speech recognition paradigm is the use of several subword classifiers, each of which provides different information about the speech signal. Although the combining methods have obtained promising results, the strategies so far proposed have been relatively simple. In most cases frame-level subword unit probabilities are combined using an unweighted product or sum rule. In this paper, we argue and empirically demonstrate that the classifier combination approach can benefit from a dynamically weighted combination rule, where the weights are derived from higher-than-frame-level confidence values.

IC992395.PDF (From Author) IC992395.PDF (Rasterized)

TOP


Utterance Verification Using Prosodic Information for Mandarin Telephone Speech Keyword Spotting

Authors:

Yeou-Jiunn Chen,
Chung-Hsien Wu,
Gwo-Lang Yan,

Page (NA) Paper number 1366

Abstract:

In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL's and 37 FINAL's in Mandarin speech, and one background/silence model, are used as the basic recognition units. For utterance verification, 12 anti-subsyllable HMM's, 175 context-dependent prosodic HMM's, and five anti-prosodic HMM's, are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females), at 8.5% false rejection, the proposed verification method resulted in 17.8% false alarm rate. Furthermore, this method was able to correctly reject 90.4% of nonkeywords. Comparison with a baseline system without prosodic-phase verification shows that the prosodic information can benefit the verification performance.

IC991366.PDF (From Author) IC991366.PDF (Rasterized)

TOP


Error Correction for Speaker-Independent Isolated Word Recognition through Likelihood Compensation Using Phonetic Bigram

Authors:

Hiroshi Matsuo,
Masaaki Ishigame,

Page (NA) Paper number 1609

Abstract:

We propose an error correction technique for speaker-independent isolated word recognition by compensating for a word's likelihood. Likelihood is compensated for by likelihood calculated by a phonetic bigram. The phonetic bigram is a phoneme model expressing frame correlation within an utterance. A speaker-independent isolated word recognition experiment showed that our proposed technique reduces recognition error compared to conventional techniques. The proposed technique achieves performance almost equal that without speaker adaptation compared to the conventional phoneme model adapted using several words.

IC991609.PDF (From Author) IC991609.PDF (Rasterized)

TOP


Advances In Confidence Measures For Large Vocabulary

Authors:

Andreas M Wendemuth, Philips Research Labs Aachen Germany (Germany)
Georg Rose, Philips Research Labs Aachen Germany (Germany)
J.G.A. Dolfing, Philips Research Labs Aachen Germany (Germany)

Page (NA) Paper number 1664

Abstract:

This paper adresses the correct choice and combination of confidence measures in large vocabulary speech recognition tasks. We classify single words within continuous as well as large vocabulary utterances into two categories: utterances within the vocabulary which are recognized correctly, and other utterances, namely misrecognized utterances or (less frequent) out-of-vocabulary (OOV). To this end, we investigate the confidence error rate (CER) for several classes of confidence measures and transformations. In particular, we employed data-independent and data-dependent measures. The transformations we investigated include mapping to single confidence measures and linear combinations of these measures. These combinations are computed by means of neural networks trained with Bayes-optimal, and with Gardner-Derrida-optimal criteria. Compared to a recognition system without confidence measures, the selection of (various combinations of) confidence measures, the selection of suitable neural network architectures and training methods, continuously improves the CER.

IC991664.PDF (From Author) IC991664.PDF (Rasterized)

TOP


Hypothesis Dependent Threshold Setting for Improved Out-of-Vocabulary Data Rejection

Authors:

Denis Jouvet, France Télécom, CNET (France)
Katarina Bartkova, France Télécom, CNET (France)
Guy Mercier, France Télécom, CNET (France)

Page (NA) Paper number 1663

Abstract:

An efficient rejection procedure is necessary to reject out-of-vocabulary words and noise tokens that occur in voice activated vocal services. Garbage or filler models are very useful for such a task. However, a post-processing of the recognized hypothesis, based on a likelihood ratio statistic test, can refine the decision and improve performance. These tests can be applied either on acoustic parameters or on phonetic or prosodic parameters that are not taken into account by the HMM-based decoder. This paper focuses on the post-processing procedure and shows that making the likelihood ratio decision threshold dependent on the recognized hypothesis largely improves the efficiency of the rejection procedure. Models and anti-models are one of the key-points of such an approach. Their training and usage are also discussed, as well as the contextual modeling involved. Finally results are reported on a field database collected from a 2000-word directory task using various phonetic and prosodic parameters.

IC991663.PDF (From Author) IC991663.PDF (Rasterized)

TOP


Buried Markov Models for Speech Recognition

Authors:

Jeff A Bilmes,

Page (NA) Paper number 2105

Abstract:

Good HMM-based speech recognition performance requires at most minimal inaccuracies to be introduced by HMM conditional independence assumptions. In this work, HMM conditional independence assumptions are relaxed in a principled way. For each hidden state value, additional dependencies are added between observation elements to increase both accuracy and discriminability. These additional dependencies are chosen according to natural statistical dependencies extant in training data that are not well modeled by an HMM. The result is called a buried Markov model (BMM) because the underlying Markov chain in an HMM is further hidden (buried) by specific cross-observation dependencies. Gaussian mixture HMMs are extended to represent BMM dependencies and new EM update equations are derived. On preliminary experiments with a large-vocabulary isolated-word speech database, BMMs are able to achieve an 11% improvement in WER with only a 9.5% increase in the number of parameters using a single state per mono-phone speech recognition system.

IC992105.PDF (From Author) IC992105.PDF (Rasterized)

TOP