SP-19.1

Application of simultaneous decoding algorithms to automatic transcription of known and unknown words
Jianxiong Wu, Vishwa Gupta (Nortel, 16 Place du Commerce, Nuns Island, Verdun, Quebec, Canada, H3E 1H6)

This paper proposes simultaneous decoding using multiple utterances to derive one or more allophonic transcriptions for each word. Three possible simultaneous decoding algorithms, namely the N-best-based algorithm, the forward-backward-based algorithm and the word-network-based algorithm, are outlined. The proposed word-network-based algorithm can incrementally decode a transcription from any number of training utterances. Speech recognition experiments for both known and unknown word vocabularies show up to 16% reduction in word error rate when simultaneously decoded allophonic transcriptions are added to the recognition dictionaries. This result holds even for dictionaries originally transcribed by expert phoneticians.

SP-19.2

High Quality Word Graphs Using Forward-Backward Pruning
Achim Sixtus (Lehrstuhl fuer Informatik VI, RWTH Aachen -- University of Technology,52056 Aachen, Germany), Stefan Ortmanns (Lucent Technologies -- Bell Labs., Murray Hill, NJ 07974, USA)

This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the time-synchronous lexical tree search method. Then, in the second pass, this huge word graph is pruned by applying a modified forward-backward algorithm. To analyze the characteristic properties of this word graph pruning method, we present a detailed comparison with the conventional time-synchronous forward pruning. The recognition experiments, carried out on the North American Business (NAB) 20000-word task, demonstrate that, in comparison to the forward pruning, the new method leads to a significant reduction in the size of the word graph without an increase in the graph word error rate.

SP-19.3

IMPROVED SPELLING RECOGNITION USING A TREE-BASED FAST LEXICAL MATCH
Carl D Mitchell, Anand R Setlur (Lucent Technologies)

This paper addresses the problem of selecting a name from a very large list using spelling recognition. In order to greatly reduce the computational resources required, we propose a tree-based lexical fast match scheme to select a short list of candidate names. Our system consists of a free letter recognizer, a fast matcher, and a rescoring stage. The letter recognizer uses n-grams to generate an n-best list of letter hypotheses. The fast matcher is a tree that is based on confusion classes, where a confusion class is a group of acoustically similar letters such as the e-set. The fast matcher reduces over 100,000 unique last names to tens or hundreds of candidates. Then the rescoring stage picks the best name using either letter alignment or a constrained grammar. The fast matcher retained the correct name 99.6% of the time and the system retrieved the correct name 97.6% of the time.

SP-19.4

A Syllable-Synchronous Network Search Algorithm for Word Decoding in Chinese Speech Recognition
Fang Zheng (Speech Laboratory, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China)

The Chinese language is syllabic in nature with frequent homonym phenomena and severe word boundary uncertainty problem. This makes the Chinese continuous speech recognition (CSR) slightly difficult. In order to solve these problems, a Chinese syllable-synchronous network search (SSNS) algorithm is proposed. Together with the vocabulary word search tree and the N-gram based language model, the syllable-synchronous network search algorithm gives a good solution to the Chinese syllable-to-word conversion. In addition, this algorithm is a good method for the accent Chinese speech recognition. The experimental results have showed that the SSNS algorithm can achieve a good overall continuous Chinese speech recognition system performance.

SP-19.5

A fast, sequential decoding algorithm with application to speaker verification
Qi Li (Bell Labs, Lucent Technologies)

To implement speaker verification (SV) technology for real-world applications with a large user population, the system cost becomes an important issue. One needs a fast algorithm which can support more users in a central telephone switch given the limited hardware, or can reduce the hardware requirement on a wireless handset. In [1], a fast, sequential decoding algorithm for left-to-right HMM was proposed. The algorithm is based on a sequential detection scheme which is asymptotically optimal in the sense of detecting a possible change in distribution as reliably and quickly as possible. In this paper, the algorithm is evaluated in a fixed-phrase SV system on a database with 23,578 utterances recorded from 100 speakers. The experimental results show that the decoding speed of the proposed algorithm is about 7 to 10 times faster than the Viterbi algorithm while the accuracy is in an acceptable level. The results indicate that the proposed algorithm can also be applied to speaker identification, utterance verification, audio segmentation, voice/silence detection and many other applications.

SP-19.6

Dynamic Programming Search Techniques for Across-Word Modelling in Speech Recognition
Klaus Beulen (RWTH Aachen), Stefan Ortmanns (Lucent Technologies), Christian Elting (RWTH Aachen)

We describe the integration of across-word models in the RWTH large vocabulary continuous speech recognition system, where our main focus is on the realization of the acoustic recognition process. This paper presents a study of two search methods based on the priniciple of dynamic programming. For both methods we discuss the implementation details and give experimental results on the Verbmobil and on the Wall Street Journal data. In addition, we introduce a score interpolation of within-word and across-word models for both search methods. In combination with across-word models this interpolation technique gives an improvement of the recognition accuracy by 14% relative to our standard system.

SP-19.7

Single-Tree Method for Grammar-Directed Search
Long Nguyen, Richard Schwartz (BBN Technologies, GTE Internetworking)

In this paper we present a very fast and accurate fast-match algorithm which, when followed by a regular beam search restricted within only the subset of words selected by the fast-match, can speed up the recognition process by at least two orders of magnitude in comparison to a typical single-pass speech recognizer utilizing the Viterbi (or beam) search algorithm. In this search strategy, the recognition vocabulary is structured as a single phonetic tree in the fast-match pass. The search on this phonetic tree is a variation of the Viterbi algorithm. Especially, we are able to use a word bigram language model without making copies of the tree during the search. This is a novel fast-match algorithm that has two important properties: high-accuracy recognition and run-time proportional to only the cube root of the vocabulary size.

SP-19.8

Selection Criteria for Hypothesis Driven Lexical Adaptation
Petra Geutner (Universitaet Karlsruhe), Michael Finke, Alex Waibel (Carnegie Mellon University)

Adapting the vocabulary of a speech recognizer to the utterance to be recognized has proven to be successful both in reducing high out-of-vocabulary as well as word error rates. This applies especially to languages that have a rapid vocabulary growth due to a large number of inflections and composita. This paper presents various adaptation methods within the Hypothesis Driven Lexical Adaptation (HDLA) framework which allow speech recognition on a virtually unlimited vocabulary. Selection criteria for the adaptation process are either based on morphological knowledge or distance measures at phoneme or grapheme level. Different methods are introduced for determining distances between phoneme pairs and for creating the large fallback lexicon the adapted vocabulary is chosen from. HDLA reduces the out-of-vocabulary-rate by 55% for Serbo-Croatian, 35% for German and 27% for Turkish. The reduced out-of-vocabulary rate also decreases the word error rate by an absolute 4.1% to 25.4% on Serbo-Croatian broadcast news data.

< SP-18 SP-20 >

Last Update: February 4, 1999 Ingo Höntsch