Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-2 |
|
SP-2.1
|
Progress in Broadcast News Transcription at Dragon Systems
Steven Wegmann,
Puming Zhan,
Larry Gillick (Dragon Systems, Inc.)
In this paper we shall report on recent progress in acoustic modelling and preprocessing in our Broadcast News transcription system. We have gone back to basics in acoustic modelling, and re-examined some of our standard practices, in particular the use of IMELDA and frequency warping, in the context of the Broadcast News corpus. We shall also report on some preliminary experiments with a generalization of IMELDA, "semi-tied covariances". In combination, these improvements lead to a 3.5% absolute improvement over our eval97 models. We shall also describe our attempts to fix our rather primitive, silence-based preprocessing system, including initial results using a new speaker-change detection algorithm based on Hotelling's T2-test.
|
SP-2.2
|
Recent Improvements to IBM's Speech Recognition System for Automatic
Transcription of Broadcast Newss
Scott S Chen,
Ellen M Eide,
Mark Gales,
Ramesh A Gopinath,
Dimitri Kanevsky,
Peder A Olsen (IBM)
We describe recent extensions and improvements to
IBM's system for automatic transcription of broadcast
news. The speech recognizer uses a total of 160 hours
of acoustic transcription, 80 hours more than for the
1997 Hub4 evaluation. In addition to improvements
obtained in1997 we made a number of changes and
algorithmic enhancements. Among these were changing
the acoustic vocabulary, reducing the number of
phonemes, insertion of short pauses, mixture models
consisting of non-gaussian components, pronunciation
networks, factor analysis (FACILT) and Bayesian
Information Criteria (BIC) applied to choosing
the number of components in a gaussian mixture model.
The models were combined in a single system using
NIST's script voting machine known as rover.
|
SP-2.3
|
Recent Experiments in Large Vocabulary Conversational Speech Recognition
Jayadev Billa,
Thomas Colhurst,
Amro El-Jaroudi,
Rukmini Iyer,
Kristine Ma,
Carl Quillen,
Fred Richardson,
Manhung Siu,
George Zavaliagkos,
Herb Gish (BBN Technologies)
This paper describes the improvements that resulted in the 1998
Byblos Large Vocabulary Conversational Speech Recognition (LVCSR)
System. Salient among these improvements are: improved signal
processing, improved Hidden Markov Model (HMM) topology, use of
quinphone context, introduction of diagonal speaker adapted training
(DSAT), incorporation of variance adaptation in the MLLR framework,
improvements in language modeling, increase in lexicon size and
combination of multiple systems. These changes resulted in about a
7\% absolute reduction in word error rates on a balanced
Switchboard/Callhome English test set.
|
SP-2.4
|
Large vocabulary speech recognition in French
Martine Adda-Decker,
Gilles Adda,
Jean-Luc S Gauvain,
Lori F Lamel (LIMSI-CNRS)
In this contribution we present some design considerations concerning
our large vocabulary continuous speech recognition system in French.
The impact of the epoch of the text training material on lexical
coverage, language model perplexity and recognition performance on
newspaper texts is demonstrated. The effectiveness of larger
vocabulary sizes and larger text training corpora for language
modeling is investigated. French is a highly inflected language
producing large lexical variety and a high homophone rate. About 30%
of recognition errors are shown to be due to substitutions between
inflected forms of a given root form. When word error rates are
analysed as a function of word frequency, a significant increase in
the error rate can be measured for frequency ranks above 5000.
|
SP-2.5
|
The Cambridge University Spoken Document Retrieval System
Sue E Johnson (Cambridge University Engineering Department),
Pierre Jourlin (Cambridge University Computer Laboratory),
Gareth L Moore (Cambridge University Engineering Department),
Karen Sparck Jones (Cambridge University Computer Laboratory),
Philip C Woodland (Cambridge University Engineering Department)
This paper describes the spoken document retrieval system that we have
been developing and assesses its performance using automatic
transcriptions of about 50 hours of broadcast news data. The
recognition engine is based on the HTK broadcast news transcription
system and the retrieval engine is based on the techniques developed at
City University. The retrieval performance over a wide range of speech
transcription error rates is presented and a number of recognition error
metrics
that more accurately reflect the impact of transcription errors on
retrieval accuracy are defined and computed. The results demonstrate the
importance of high accuracy automatic transcription. The final system
is currently being evaluated on the 1998 TREC-7 spoken document retrieval
task.
|
SP-2.6
|
Improvements in Recognition of Conversational Telephone Speech
Barbara Peskin,
Michael Newman,
Don McAllaster,
Venkatesh Nagesha (Dragon Systems, Inc.),
Hywel Richards (Dragon Systems UK),
Steven Wegmann (Dragon Systems, Inc.),
Melvyn Hunt (Dragon Systems UK),
Larry Gillick (Dragon Systems, Inc.)
This paper describes recent changes in Dragon's speech recognition system which have markedly improved performance on conversational telephone speech. Key changes include: the conversion to modified PLP-based cepstra from mel-cepstra; the replacement of our usual IMELDA transform by a new transform using "semi-tied covariance"; a new multi-pass adaptation protocol; probabilities on alternate pronunciations in the lexicon; the addition of word-boundary tags in our acoustic models and the redistribution of model parameters to build fewer output distributions but with more mixture components per model.
|
SP-2.7
|
The 1998 HTK System for Transcription of Conversational Telephone Speech
Thomas Hain,
Philip C Woodland,
Thomas R Niesler,
Edward W.D Whittaker (Cambridge University Engineering Department)
This paper describes the 1998 HTK large vocabulary speech recognition system
for conversational telephone speech as used in the NIST 1998 Hub5E evaluation.
Front-end and language modelling experiments conducted using various training
and test sets from both the Switchboard and Callhome English corpora are
presented. Our complete system includes reduced bandwidth analysis, side-based
cepstral feature normalisation, vocal tract length normalisation (VTLN),
triphone and quinphone hidden Markov models (HMMs) built using speaker
adaptive training (SAT), maximum likelihood linear regression (MLLR) speaker
adaptation and a confidence score based system combination. A detailed
description of the complete system together with experimental results for each
stage of our multi-pass decoding scheme is presented. The word error rate
obtained is almost 20% better than our 1997 system on the
development set.
|
SP-2.8
|
Real-Time Telephone-Based Speech Recognition in the Jupiter Domain
James R Glass,
Timothy J Hazen,
I. Lee Hetherington (MIT Laboratory for Computer Science)
This paper describes our experiences with
developing a real-time telephone-based speech
recognizer as part of a conversational system in
the weather information domain. This system has
been used to collect spontaneous speech data which
has proven to be extremely valuable for research
in a number of different areas. After describing
the corpus we have collected, we describe the
development of the recognizer vocabulary,
pronunciations, language and acoustic models for
this system, the new weighted finite-state
transducer-based lexical access component, and
report on the current performance of the
recognizer under several different conditions. We
also analyze recognition latency to verify that
the system performs in real time.
|
|