Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-19 |
|
SP-19.1
|
Application of simultaneous decoding algorithms to automatic transcription of known and unknown words
Jianxiong Wu,
Vishwa Gupta (Nortel, 16 Place du Commerce, Nuns Island, Verdun, Quebec, Canada, H3E 1H6)
This paper proposes simultaneous decoding using
multiple utterances to derive one or more allophonic
transcriptions for each word. Three possible
simultaneous decoding algorithms, namely the
N-best-based algorithm, the forward-backward-based
algorithm and the word-network-based algorithm,
are outlined. The proposed word-network-based
algorithm can incrementally decode a transcription
from any number of training utterances. Speech
recognition experiments for both known and unknown
word vocabularies show up to 16% reduction in word
error rate when simultaneously decoded allophonic
transcriptions are added to the recognition
dictionaries. This result holds even for dictionaries
originally transcribed by expert phoneticians.
|
SP-19.2
|
High Quality Word Graphs Using Forward-Backward Pruning
Achim Sixtus (Lehrstuhl fuer Informatik VI, RWTH Aachen -- University of Technology,52056 Aachen, Germany),
Stefan Ortmanns (Lucent Technologies -- Bell Labs., Murray Hill, NJ 07974, USA)
This paper presents an efficient method
for constructing high quality word graphs
for large vocabulary continuous speech recognition.
The word graphs are constructed in a two-pass strategy.
In the first pass, a huge word graph is produced using the
time-synchronous lexical tree search method.
Then, in the second pass, this huge word graph
is pruned by applying a modified forward-backward algorithm.
To analyze the characteristic properties of
this word graph pruning method, we present a detailed
comparison with the conventional time-synchronous forward pruning.
The recognition experiments, carried out on the
North American Business (NAB) 20000-word task,
demonstrate that, in comparison to the forward pruning,
the new method leads to a significant reduction in the size of
the word graph without an increase in the graph word error rate.
|
SP-19.3
|
IMPROVED SPELLING RECOGNITION USING A TREE-BASED FAST LEXICAL MATCH
Carl D Mitchell,
Anand R Setlur (Lucent Technologies)
This paper addresses the problem of selecting a name
from a very large list using spelling recognition.
In order to greatly reduce the computational resources required,
we propose a tree-based lexical fast match scheme to select a
short list of candidate names. Our system consists of a free
letter recognizer, a fast matcher, and a rescoring stage.
The letter recognizer uses n-grams to generate an n-best list
of letter hypotheses. The fast matcher is a tree that is based
on confusion classes, where a confusion class is
a group of acoustically similar letters such as the e-set.
The fast matcher reduces over 100,000 unique last names to
tens or hundreds of candidates. Then the rescoring stage picks
the best name using either letter alignment or a constrained grammar.
The fast matcher retained the correct name 99.6% of the time and the
system retrieved the correct name 97.6% of the time.
|
SP-19.4
|
A Syllable-Synchronous Network Search Algorithm for Word Decoding in Chinese Speech Recognition
Fang Zheng (Speech Laboratory, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China)
The Chinese language is syllabic in nature with frequent homonym phenomena and severe word boundary uncertainty problem. This makes the Chinese continuous speech recognition (CSR) slightly difficult. In order to solve these problems, a Chinese syllable-synchronous network search (SSNS) algorithm is proposed. Together with the vocabulary word search tree and the N-gram based language model, the syllable-synchronous network search algorithm gives a good solution to the Chinese syllable-to-word conversion. In addition, this algorithm is a good method for the accent Chinese speech recognition. The experimental results have showed that the SSNS algorithm can achieve a good overall continuous Chinese speech recognition system performance.
|
SP-19.5
|
A fast, sequential decoding algorithm with application to speaker verification
Qi Li (Bell Labs, Lucent Technologies)
To implement speaker verification (SV) technology for
real-world applications with a large user population,
the system cost becomes an important issue. One needs
a fast algorithm which can support more users in a
central telephone switch given the limited hardware,
or can reduce the hardware requirement on a wireless
handset. In [1], a fast, sequential decoding algorithm
for left-to-right HMM was proposed. The algorithm is
based on a sequential detection scheme which is
asymptotically optimal in the sense of detecting a
possible change in distribution as reliably and quickly
as possible. In this paper, the algorithm is evaluated
in a fixed-phrase SV system on a database with 23,578
utterances recorded from 100 speakers. The experimental
results show that the decoding speed of the proposed
algorithm is about 7 to 10 times faster than the
Viterbi algorithm while the accuracy is in an
acceptable level. The results indicate that the
proposed algorithm can also be applied to speaker
identification, utterance verification, audio
segmentation, voice/silence detection and many other
applications.
|
SP-19.6
|
Dynamic Programming Search Techniques for Across-Word Modelling in Speech Recognition
Klaus Beulen (RWTH Aachen),
Stefan Ortmanns (Lucent Technologies),
Christian Elting (RWTH Aachen)
We describe the integration of across-word models
in the RWTH large vocabulary continuous speech
recognition system, where our main focus is on the
realization of the acoustic recognition process.
This paper presents a study of two search methods
based on the priniciple of dynamic programming.
For both methods we discuss the implementation details
and give experimental results on the Verbmobil and on
the Wall Street Journal data. In addition, we introduce
a score interpolation of within-word and across-word
models for both search methods. In combination with
across-word models this interpolation technique
gives an improvement of the recognition accuracy by
14% relative to our standard system.
|
SP-19.7
|
Single-Tree Method for Grammar-Directed Search
Long Nguyen,
Richard Schwartz (BBN Technologies, GTE Internetworking)
In this paper we present a very fast and accurate fast-match algorithm which,
when followed by a regular beam search restricted within only the subset of
words selected by the fast-match, can speed up the recognition process by at
least two orders of magnitude in comparison to a typical single-pass speech
recognizer utilizing the Viterbi (or beam) search algorithm. In this search
strategy, the recognition vocabulary is structured as a single phonetic tree
in the fast-match pass. The search on this phonetic tree is a variation of the
Viterbi algorithm. Especially, we are able to use a word bigram language
model without making copies of the tree during the search. This is a novel
fast-match algorithm that has two important properties: high-accuracy
recognition and run-time proportional to only the cube root of the vocabulary
size.
|
SP-19.8
|
Selection Criteria for Hypothesis Driven Lexical Adaptation
Petra Geutner (Universitaet Karlsruhe),
Michael Finke,
Alex Waibel (Carnegie Mellon University)
Adapting the vocabulary of a speech recognizer to the utterance to be
recognized has proven to be successful both in reducing high
out-of-vocabulary as well as word error rates. This applies
especially to languages that have a rapid vocabulary growth due to a
large number of inflections and composita. This paper presents
various adaptation methods within the Hypothesis Driven Lexical
Adaptation (HDLA) framework which allow speech recognition on a
virtually unlimited vocabulary. Selection criteria for the adaptation
process are either based on morphological knowledge or distance
measures at phoneme or grapheme level. Different methods are
introduced for determining distances between phoneme pairs and for
creating the large fallback lexicon the adapted vocabulary is chosen
from. HDLA reduces the out-of-vocabulary-rate by 55% for
Serbo-Croatian, 35% for German and 27% for Turkish. The reduced
out-of-vocabulary rate also decreases the word error rate by an
absolute 4.1% to 25.4% on Serbo-Croatian broadcast news data.
|
|