WORDSPOTTING, REJECTION, AND TOPIC IDENTIFICATION

Chair: Jay Wilpon, AT&T Bell Laboratories (USA)

Home

A Training Procedure for Verifying String Hypotheses in Continuous Speech Recognition

Authors:

R.C. Rose, AT&T Bell Laboratories (USA)
B.H. Juang, AT&T Bell Laboratories (USA)
C.H. Lee, AT&T Bell Laboratories (USA)

Volume 1, Page 281

Abstract:

A procedure is proposed for verifying the occurrence of string hypotheses produced by a hidden Markov model (HMM) based continuous speech recognizer. Most existing procedures verify word hypotheses through likelihood ratio scoring procedures computed using ad hoc approximations for the density of the alternative hypothesis in the denominator of the likelihood ratio statistic. The discriminative training procedure described in this paper attempts to adjust the parameters of the null hypothesis and the alternate hypothesis models to increase the power of a hypothesis test for utterance verification. The training procedure was evaluated for its ability to detect a twenty word vocabulary in a subset of the Switchboard conversational speech corpus. Experimental results show that the use of this procedure results in significant improvement in the word verification operating characteristic, as well as an improvement in the overall system performance.

300dpi TIFF Images of pages:

281 282 283 284

Acrobat PDF file of whole paper:

ic950281.pdf

TOP

Robust Utterance Verification for Connected Digits Recognition

Authors:

Mazin G. Rahim, AT&T Bell Laboratories (USA)
Chin-Hui Lee, AT&T Bell Laboratories (USA)
Biing-Hwang Juang, AT&T Bell Laboratories (USA)

Volume 1, Page 285

Abstract:

Utterance verification represents an important technology in the design of user-friendly speech recognition systems. This paper addresses the issue of robustness in utterance verification. Four different approaches to robustness have been investigated: a string-based likelihood measure for the detection of non-vocabulary words and "putative" errors, a signal bias removal method for channel normalization, on-line adaptation technique for achieving desirable trade-off between false rejection and false alarms, and a discriminative training method for the minimization of the expected string error rate. When these techniques were all integrated into a state-of-the- art connected digit recognition system, the string error rate was found to decrease by up to 57% at a rejection rate of 5%. For non-vocabulary word strings, the proposed utterance verification system rejected over 99.9% of extraneous speech.

300dpi TIFF Images of pages:

285 286 287 288

Acrobat PDF file of whole paper:

ic950285.pdf

TOP

A Hybrid Wordspotting Method for Spontaneous Speech Understanding Using Word-Based Pattern Matching and Phoneme-Based HMM

Authors:

Hiroshi Kanazawa, Toshiba Corporation (JAPAN)
Mitsuyoshi Tachimori, Toshiba Corporation (JAPAN)
Yoichi Takebayashi, Toshiba Corporation (JAPAN)

Volume 1, Page 289

Abstract:

We have proposed a new wordspotting method, combining word-based pattern matching and phoneme-based HMM. Word-based pattern matching based on the time-frequency representation of a whole word pattern is robust against pattern variations and background noise, while the phoneme-based HMM, which represents phonemic features within a word pattern, is flexible for expanding the vocabulary. Because of the difference in scope, these two have their own characteristics in terms of robustness and accuracy. To take advantage of the features of these two, we have integrated these different types of wordspotting results under a unified criterion. A syntactic and semantic parser is also utilized to prune the wordspotting results for spontaneous speech understanding. Experimental results indicate the effectiveness of the proposed method.

300dpi TIFF Images of pages:

289 290 291 292

Acrobat PDF file of whole paper:

ic950289.pdf

TOP

Acoustic and Language Modeling of Human and Nonhuman Noises for Human-to-Human Spontaneous Speech Recognition

Authors:

T. Schultz, University of Karlsruhe (GERMANY) and Carnegie Mellon University (USA)
I. Rogina, University of Karlsruhe (GERMANY) and Carnegie Mellon University (USA)

Volume 1, Page 293

Abstract:

In this paper several improvements of our speech-to-speech translation system JANUS-2 on spontaneous human-to-human dialogs are presented. Common phenomena in spontaneous speech are described, followed by a classification of different types of noises. To handle the variety of spontaneous effects in human-to-human dialogs, special noise models are introduced representing both human and nonhuman noises, as well as word fragments. It will be shown that both the acoustic and the language modeling of these noises increase the recognition performance significantly. In the experiments, a clustering of the noise classes is performed and the resulting cluster variants are compared, thus allowing to determine the best tradeoff between sensitivity and trainability of the models.

300dpi TIFF Images of pages:

293 294 295 296

Acrobat PDF file of whole paper:

ic950293.pdf

TOP

LVCSR Log-Likelihood Ratio Scoring for Keyword Spotting

Authors:

Mitchel Weintraub, SRI International (USA)

Volume 1, Page 297

Abstract:

A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system to generate the N-best answers along with their Viterbi alignments. The score for a putative hit is computed by summing the likelihoods for all hypotheses that contain the keyword normalized by dividing by the sum of all hypothesis likelihoods in the N-best list. Using a test set of conversational speech from Switchboard Credit Card conversations, we achieved an 81% figure of merit (FOM). Our word recognition error rate on this same test set is 54.7%.

300dpi TIFF Images of pages:

297 298 299 300

Acrobat PDF file of whole paper:

ic950297.pdf

TOP

Keyword Spotting Using Supervised/Unsupervised Competitive Learning

Authors:

Chakib Tadj, Telecom Paris (FRANCE)
Franck Poirier, Telecom Paris (FRANCE)

Volume 1, Page 301

Abstract:

In this paper, we present a novel hybrid keyword spotting system that combines supervised and unsupervised competitive learning algorithms. The first stage is a SOFM module which is specifically designed for discriminating between keywords (KWs) and non-keywords (NKWs). The second stage is a FDVQ (Fuzzy Dynamic Vector Quantization) module which consists of discriminating between KWs detected by the first stage processing. As the FDVQ was not designed to represent the acoustic garbage models, our standard FDVQ based keyword spotter system was based on some threshold considerations to reject the NKWs models. This conduct us to introduce on upstream a SOFM module which is designed for this task. The results show an improvement of about 9% on the accuracy of the system comparing to our standard one.

300dpi TIFF Images of pages:

301 302 303 304

Acrobat PDF file of whole paper:

ic950301.pdf

TOP

A Continuous Density Neural Tree Network Word Spotting System

Authors:

Stephen V. Kosonocky, IBM T.J. Watson Research Center
Richard J. Mammone, Rutgers University (USA)

Volume 1, Page 305

Abstract:

A new classifier is described that combines the discriminatory ability of the neural tree network (NTN) with the Gaussian mixture model to create a continuous density neural tree network (CDNTN). The (CDNTN) is used within a Hidden Markov Model (HMM), along with a nonparametric state duration model to construct a continuous word spotting system for real time applications. The new word spotting system does not use a general background model, allowing construction of independent models whose performance is independent of the number of models in the recognition system, supporting a direct parallel implementation. Although HMM word spotting systems are shown to provide good performance when sufficient training data is available, for applications where background speech data is not available or only a limited numbers of training tokens are available, the CDNTN word spotting system is shown to outperform comparable HMM systems.

300dpi TIFF Images of pages:

305 306 307 308

Acrobat PDF file of whole paper:

ic950305.pdf

TOP

Video Mail Retrieval: The Effect of Word Spotting Accuracy on Precision

Authors:

G.J.F. Jones, Cambridge University (UK)
J.T. Foote, Cambridge University (UK)
K. Sparck Jones, Cambridge University (UK)
S.J. Young, Cambridge University (UK)

Volume 1, Page 309

Abstract:

The goal of the Video Mail Retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by word spotting performance. It includes a description of the database design, the word spotting algorithm, and the information retrieval method used. Results are presented which show audio retrieval performance very close to that of text.

300dpi TIFF Images of pages:

309 310 311 312

Acrobat PDF file of whole paper:

ic950309.pdf

TOP

Improved Topic Spotting through Statistical Modelling of Keyword Dependencies

Authors:

Jerry H. Wright, Ensigma Limited (UK)
Michael J. Carey, Ensigma Limited (UK)
Eluned S. Parris, Ensigma Limited (UK)

Volume 1, Page 313

Abstract:

Keywords are chosen on the basis of their usefulness for discriminating a topic from background speech. Good topic recognition can be achieved with a small set of well-chosen keywords, but particular combinations of keywords often achieve better discrimination than can be accounted for by regarding them as independent. This paper describes a higher-order statistical approach involving models of keyword-topic interdependence. A linear-logistic model brings some improvement in performance, but better results are obtained using log-linear contingency table models. Although the potential number of these is very large, good models tend to be simple and are suggested by heuristic measures inferred from the training data. The approach is tested using a broadcast radio database.

300dpi TIFF Images of pages:

313 314 315 316

Acrobat PDF file of whole paper:

ic950313.pdf

TOP

Topic Focusing Mechanism for Speech Recognition Based on Probabilistic Grammar and Topic Markov Model

Authors:

Takeshi Kawabata, NTT Basic Research Labs (JAPAN)

Volume 1, Page 317

Abstract:

This paper describes a new stochastic topic focusing mechanism for reducing the perplexity of natural spoken languages. In this mechanism, a predictive context-free grammar (CFG) parser analyzes input speech and generates grammar-rule sequences. These rule sequences drive a hidden Markov model (HMM), and the current topic is estimated as the HMM state distribution. The CFG rule probabilities are dynamically changed according to this topic state distribution. Evaluation of this mechanism using a large dialog text database confirms that it can effectively reduce the task perplexity.

300dpi TIFF Images of pages:

317 318 319 320

Acrobat PDF file of whole paper:

ic950317.pdf