Authors:
Steven Wegmann,
Puming Zhan,
Larry Gillick,
Page (NA) Paper number 1912
Abstract:
In this paper we shall report on recent progress in acoustic modelling
and preprocessing in our Broadcast News transcription system. We have
gone back to basics in acoustic modelling, and re-examined some of
our standard practices, in particular the use of IMELDA and frequency
warping, in the context of the Broadcast News corpus. We shall also
report on some preliminary experiments with a generalization of IMELDA,
"semi-tied covariances". In combination, these improvements lead to
a 3.5% absolute improvement over our eval97 models. We shall also describe
our attempts to fix our rather primitive, silence-based preprocessing
system, including initial results using a new speaker-change detection
algorithm based on Hotelling's T2-test.
Authors:
Scott S Chen,
Ellen M Eide,
Mark J.F. Gales,
Ramesh A Gopinath,
Dimitri Kanevsky,
Peder A Olsen,
Page (NA) Paper number 2096
Abstract:
We describe recent extensions and improvements to IBM's system for
automatic transcription of broadcast news. The speech recognizer uses
a total of 160 hours of acoustic transcription, 80 hours more than
for the 1997 Hub4 evaluation. In addition to improvements obtained
in1997 we made a number of changes and algorithmic enhancements. Among
these were changing the acoustic vocabulary, reducing the number of
phonemes, insertion of short pauses, mixture models consisting of non-gaussian
components, pronunciation networks, factor analysis (FACILT) and Bayesian
Information Criteria (BIC) applied to choosing the number of components
in a gaussian mixture model. The models were combined in a single system
using NIST's script voting machine known as rover.
Authors:
Jayadev Billa,
Thomas Colhurst,
Amro El-Jaroudi,
Rukmini Iyer,
Kristine Ma,
Spyros Matsoukas,
Carl Quillen,
Fred Richardson,
Man-Hung Siu,
George Zavaliagkos,
Herbert Gish,
Page (NA) Paper number 2390
Abstract:
This paper describes the improvements that resulted in the 1998 Byblos
Large Vocabulary Conversational Speech Recognition (LVCSR) System.
Salient among these improvements are: improved signal processing, improved
Hidden Markov Model (HMM) topology, use of quinphone context, introduction
of diagonal speaker adapted training (DSAT), incorporation of variance
adaptation in the MLLR framework, improvements in language modeling,
increase in lexicon size and combination of multiple systems. These
changes resulted in about a 7% absolute reduction in word error rates
on a balanced Switchboard/Callhome English test set.
Authors:
Martine Adda-Decker,
Gilles Adda,
Jean-Luc S Gauvain,
Lori F Lamel,
Page (NA) Paper number 2250
Abstract:
In this contribution we present some design considerations concerning
our large vocabulary continuous speech recognition system in French.
The impact of the epoch of the text training material on lexical coverage,
language model perplexity and recognition performance on newspaper
texts is demonstrated. The effectiveness of larger vocabulary sizes
and larger text training corpora for language modeling is investigated.
French is a highly inflected language producing large lexical variety
and a high homophone rate. About 30% of recognition errors are shown
to be due to substitutions between inflected forms of a given root
form. When word error rates are analysed as a function of word frequency,
a significant increase in the error rate can be measured for frequency
ranks above 5000.
Authors:
Sue E Johnson,
Pierre Jourlin,
Gareth L Moore,
Karen Spärck Jones,
Philip C Woodland,
Page (NA) Paper number 2304
Abstract:
This paper describes the spoken document retrieval system that we have
been developing and assesses its performance using automatic transcriptions
of about 50 hours of broadcast news data. The recognition engine is
based on the HTK broadcast news transcription system and the retrieval
engine is based on the techniques developed at City University. The
retrieval performance over a wide range of speech transcription error
rates is presented and a number of recognition error metrics that more
accurately reflect the impact of transcription errors on retrieval
accuracy are defined and computed. The results demonstrate the importance
of high accuracy automatic transcription. The final system is currently
being evaluated on the 1998 TREC-7 spoken document retrieval task.
Authors:
Barbara Peskin,
Michael Newman,
Don McAllaster,
Venkatesh Nagesha,
Hywel B Richards, Dragon Systems UK (U.K.)
Steven Wegmann,
Melvyn Hunt, Dragon Systems UK (U.K.)
Larry Gillick,
Page (NA) Paper number 1922
Abstract:
This paper describes recent changes in Dragon's speech recognition
system which have markedly improved performance on conversational telephone
speech. Key changes include: the conversion to modified PLP-based cepstra
from mel-cepstra; the replacement of our usual IMELDA transform by
a new transform using "semi-tied covariance"; a new multi-pass adaptation
protocol; probabilities on alternate pronunciations in the lexicon;
the addition of word-boundary tags in our acoustic models and the redistribution
of model parameters to build fewer output distributions but with more
mixture components per model.
Authors:
Thomas Hain,
Philip C Woodland,
Thomas R Niesler,
Edward W.D Whittaker,
Page (NA) Paper number 2311
Abstract:
This paper describes the 1998 HTK large vocabulary speech recognition
system for conversational telephone speech as used in the NIST 1998
Hub5E evaluation. Front-end and language modelling experiments conducted
using various training and test sets from both the Switchboard and
Callhome English corpora are presented. Our complete system includes
reduced bandwidth analysis, side-based cepstral feature normalisation,
vocal tract length normalisation (VTLN), triphone and quinphone hidden
Markov models (HMMs) built using speaker adaptive training (SAT), maximum
likelihood linear regression (MLLR) speaker adaptation and a confidence
score based system combination. A detailed description of the complete
system together with experimental results for each stage of our multi-pass
decoding scheme is presented. The word error rate obtained is almost
20% better than our 1997 system on the development set.
Authors:
James R Glass,
Timothy J Hazen,
I. Lee Hetherington,
Page (NA) Paper number 2464
Abstract:
This paper describes our experiences with developing a real-time telephone-based
speech recognizer as part of a conversational system in the weather
information domain. This system has been used to collect spontaneous
speech data which has proven to be extremely valuable for research
in a number of different areas. After describing the corpus we have
collected, we describe the development of the recognizer vocabulary,
pronunciations, language and acoustic models for this system, the new
weighted finite-state transducer-based lexical access component, and
report on the current performance of the recognizer under several different
conditions. We also analyze recognition latency to verify that the
system performs in real time.
|