Session: SPEECH-L4
Time: 1:00 - 3:00, Wednesday, May 9, 2001
Location: Room 150
Title: Confidence Measures and Rejection
Chair: Jim Glass

1:00, SPEECH-L4.1
A ONE-PASS STRATEGY FOR KEYWORD SPOTTING AND VERIFICATION
C. LAI, B. SHI
One common method for keyword spotting in unconstrained speech is based upon a two pass strategy consisting of Viterbi-decoding to detect and segment possible keyword hits, followed by the computation of a confidence measure to verify those hits. In this paper, we propose a simple one-pass strategy where computation of the confidence measure is computed simultaneously with a Viterbi-like decoding stage. However, backtracking is not required, which when coupled with the need for only a single pass through the utterance significantly reduces the memory requirements of this algorithm. This feature makes it well suited for devices where processing power and memory are limited. Experimental results on a connected digits task show that performance of the decoding is comparable to that using a Viterbi search with backtracking. Experimental results on spotting days of the week in continuous speech indicate that the confidence measure calculated is effective in reducing the number of false alarms.

1:20, SPEECH-L4.2
A SUPPORT VECTOR MACHINES-BASED REJECTION TECHNIQUE FOR SPEECH RECOGNITION
C. MA, M. RANDOLPH, J. DRISH
Support Vector Machines represent a new approach to pattern classification developed from the theory of Structural Risk Minimization. In this paper, we present an investigation into the application of Support Vector Machines' to the confidence measurement problem in speech recognition. Specifically, based on the results from an initial decoding of an utterance during speech recognition, we derive a feature vector consisting of parameters such as word score density, N-best word score density differences, relative word score and relative word duration as input to the confidence measurement process in which hypothetically correct utterances are accepted and utterances determined to be incorrect are rejected. We propose a new approach to training Support Vector Machines. In this paper, we have trained and tested a Support Vector Machines classifier and compared the results with other statistical classification methods.

1:40, SPEECH-L4.3
ON COMBINING RECOGNIZERS FOR IMPROVED RECOGNITION OF SPELLED NAMES
D. JOUVET, S. DROGUET
This paper deals with the recognition of spelled names over the telephone. Two recognition approaches are recalled. One is based on a forward-backward algorithm in which the spelling lexicon is handled by the A* algorithm in the backward pass. The other is a 2-step approach, which relies on a discrete HMM-based retrieval procedure. Both approaches integrate a rejection test. Combinations of the two approaches are investigated in this paper. First, a sequential combination is presented. The 2-step approach is used only when the forward-backward approach do not yield an answer because of memory limitations. This sequential combination, evaluated on field data collected from a vocal directory service, takes the best of both approaches. Results are presented for the recognition of valid spelled names as well as for the rejection of incorrect data. Finally, a detailed analysis of the recognition results of the 2 approaches shows that a comparison of the 2 recognition results leads to an efficient reliability criterion.

2:00, SPEECH-L4.4
ROBUST CONFIDENCE ANNOTATION AND REJECTION FOR CONTINUOUS SPEECH RECOGNITION
B. MAISON, R. GOPINATH
We are looking for confidence scoring techniques that perform well on a broad variety of tasks. Our main focus is on word-level error rejection, but most results apply to other scenarios as well. A variation of the Normalized Cross Entropy that is adapted to that purpose is introduced. It is successfully used to automatically select features and optimize the word-level confidence measure on several test sets. Sentence-level confidence geared toward the rejection of out-of-grammar utterances is also investigated. The combination of a word graph based technique and the acoustic score shows excellent performance across all the tasks we considered.

2:20, SPEECH-L4.5
CONFIDENCE MEASURES FOR SPOKEN DIALOGUE SYSTEMS
R. SAN-SEGUNDO, B. PELLOM, K. HACIOGLU, W. WARD, J. PARDO
This paper provides improved confidence assessment for detection of word-level speech recognition errors, out of domain utterances and incorrect concepts in the CU Communicator system. New features from the understanding component are proposed for confidence annotation at utterance and concept levels. Using the data collected during seven months (over 900 calls from real users), it is shown that 53.2% of incorrectly recognized words, 53.2% of out of domain utterances and 50.1% of incorrect concepts are detected at a 5% false rejection rate. We have considered a neural network to combine all features in each level. At the word level, we propose the use of confidence measures to combine several hypotheses from different recognizers, obtaining a 14.0% relative word error rate reduction. We propose a new algorithm to build a word-graph from several hypotheses.

2:40, SPEECH-L4.6
A COMPARISON AND COMBINATION OF METHODS FOR OOV WORD DETECTION AND WORD CONFIDENCE SCORING
T. HAZEN, I. BAZZI
This paper examines an approach for combining two different methods for detecting errors in the output of a speech recognizer. The first method attempts to alleviate recognition errors by using an explicit model for detecting the presence of out-of-vocabulary (OOV) words. The second method identifies potentially misrecognized words from a set of confidence features extracted from the recognition process using a confidence scoring model. Since these two methods are inherently different, an approach which combines the techniques can provide significant advantages over either of the individual methods. In experiments in the JUPITER weather domain, we compare and contrast the two approaches and demonstrate the advantage of the combined approach. In comparison to either of the two individual approaches, the combined approach achieves over 25% fewer false acceptances of incorrectly recognized keywords (from 55% to 40%) at a 98% acceptance rate of correctly recognized keywords.