Authors:
Jianxiong Wu, Nortel, 16 Place du Commerce, Nuns Island, Verdun, Quebec, Canada, H3E 1H6 (Canada)
Vishwa Gupta, Nortel, 16 Place du Commerce, Nuns Island, Verdun, Quebec, Canada, H3E 1H6 (Canada)
Page (NA) Paper number 1267
Abstract:
This paper proposes simultaneous decoding using multiple utterances
to derive one or more allophonic transcriptions for each word. Three
possible simultaneous decoding algorithms, namely the N-best-based
algorithm, the forward-backward-based algorithm and the word-network-based
algorithm, are outlined. The proposed word-network-based algorithm
can incrementally decode a transcription from any number of training
utterances. Speech recognition experiments for both known and unknown
word vocabularies show up to 16% reduction in word error rate when
simultaneously decoded allophonic transcriptions are added to the recognition
dictionaries. This result holds even for dictionaries originally transcribed
by expert phoneticians.
Authors:
Achim Sixtus, Lehrstuhl fuer Informatik VI, RWTH Aachen -- University of Technology,52056 Aachen, Germany (Germany)
Stefan Ortmanns, Lucent Technologies -- Bell Labs., Murray Hill, NJ 07974, USA (USA)
Page (NA) Paper number 1862
Abstract:
This paper presents an efficient method for constructing high quality
word graphs for large vocabulary continuous speech recognition. The
word graphs are constructed in a two-pass strategy. In the first pass,
a huge word graph is produced using the time-synchronous lexical tree
search method. Then, in the second pass, this huge word graph is pruned
by applying a modified forward-backward algorithm. To analyze the characteristic
properties of this word graph pruning method, we present a detailed
comparison with the conventional time-synchronous forward pruning.
The recognition experiments, carried out on the North American Business
(NAB) 20000-word task, demonstrate that, in comparison to the forward
pruning, the new method leads to a significant reduction in the size
of the word graph without an increase in the graph word error rate.
Authors:
Carl D Mitchell,
Anand R Setlur,
Page (NA) Paper number 2429
Abstract:
This paper addresses the problem of selecting a name from a very large
list using spelling recognition. In order to greatly reduce the computational
resources required, we propose a tree-based lexical fast match scheme
to select a short list of candidate names. Our system consists of a
free letter recognizer, a fast matcher, and a rescoring stage. The
letter recognizer uses n-grams to generate an n-best list of letter
hypotheses. The fast matcher is a tree that is based on confusion classes,
where a confusion class is a group of acoustically similar letters
such as the e-set. The fast matcher reduces over 100,000 unique last
names to tens or hundreds of candidates. Then the rescoring stage picks
the best name using either letter alignment or a constrained grammar.
The fast matcher retained the correct name 99.6% of the time and the
system retrieved the correct name 97.6% of the time.
Authors:
Fang Zheng, Speech Laboratory, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China (China)
Page (NA) Paper number 1217
Abstract:
The Chinese language is syllabic in nature with frequent homonym phenomena
and severe word boundary uncertainty problem. This makes the Chinese
continuous speech recognition (CSR) slightly difficult. In order to
solve these problems, a Chinese syllable-synchronous network search
(SSNS) algorithm is proposed. Together with the vocabulary word search
tree and the N-gram based language model, the syllable-synchronous
network search algorithm gives a good solution to the Chinese syllable-to-word
conversion. In addition, this algorithm is a good method for the accent
Chinese speech recognition. The experimental results have showed that
the SSNS algorithm can achieve a good overall continuous Chinese speech
recognition system performance.
Authors:
Qi Li,
Page (NA) Paper number 2361
Abstract:
To implement speaker verification (SV) technology for real-world applications
with a large user population, the system cost becomes an important
issue. One needs a fast algorithm which can support more users in a
central telephone switch given the limited hardware, or can reduce
the hardware requirement on a wireless handset. In [1], a fast, sequential
decoding algorithm for left-to-right HMM was proposed. The algorithm
is based on a sequential detection scheme which is asymptotically optimal
in the sense of detecting a possible change in distribution as reliably
and quickly as possible. In this paper, the algorithm is evaluated
in a fixed-phrase SV system on a database with 23,578 utterances recorded
from 100 speakers. The experimental results show that the decoding
speed of the proposed algorithm is about 7 to 10 times faster than
the Viterbi algorithm while the accuracy is in an acceptable level.
The results indicate that the proposed algorithm can also be applied
to speaker identification, utterance verification, audio segmentation,
voice/silence detection and many other applications.
Authors:
Klaus Beulen,
Stefan Ortmanns,
Christian Elting,
Page (NA) Paper number 2074
Abstract:
We describe the integration of across-word models in the RWTH large
vocabulary continuous speech recognition system, where our main focus
is on the realization of the acoustic recognition process. This paper
presents a study of two search methods based on the priniciple of dynamic
programming. For both methods we discuss the implementation details
and give experimental results on the Verbmobil and on the Wall Street
Journal data. In addition, we introduce a score interpolation of within-word
and across-word models for both search methods. In combination with
across-word models this interpolation technique gives an improvement
of the recognition accuracy by 14% relative to our standard system.
Authors:
Long Nguyen,
Richard Schwartz,
Page (NA) Paper number 2393
Abstract:
In this paper we present a very fast and accurate fast-match algorithm
which, when followed by a regular beam search restricted within only
the subset of words selected by the fast-match, can speed up the recognition
process by at least two orders of magnitude in comparison to a typical
single-pass speech recognizer utilizing the Viterbi (or beam) search
algorithm. In this search strategy, the recognition vocabulary is structured
as a single phonetic tree in the fast-match pass. The search on this
phonetic tree is a variation of the Viterbi algorithm. Especially,
we are able to use a word bigram language model without making copies
of the tree during the search. This is a novel fast-match algorithm
that has two important properties: high-accuracy recognition and run-time
proportional to only the cube root of the vocabulary size.
Authors:
Petra Geutner,
Michael Finke,
Alex Waibel,
Page (NA) Paper number 1999
Abstract:
Adapting the vocabulary of a speech recognizer to the utterance to
be recognized has proven to be successful both in reducing high out-of-vocabulary
as well as word error rates. This applies especially to languages that
have a rapid vocabulary growth due to a large number of inflections
and composita. This paper presents various adaptation methods within
the Hypothesis Driven Lexical Adaptation (HDLA) framework which allow
speech recognition on a virtually unlimited vocabulary. Selection criteria
for the adaptation process are either based on morphological knowledge
or distance measures at phoneme or grapheme level. Different methods
are introduced for determining distances between phoneme pairs and
for creating the large fallback lexicon the adapted vocabulary is chosen
from. HDLA reduces the out-of-vocabulary-rate by 55% for Serbo-Croatian,
35% for German and 27% for Turkish. The reduced out-of-vocabulary rate
also decreases the word error rate by an absolute 4.1% to 25.4% on
Serbo-Croatian broadcast news data.
|