ICASSP99 Large Vocabulary Recognition

Large Vocabulary Recognition
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Progress in Broadcast News Transcription at Dragon Systems Authors: Steven Wegmann, Puming Zhan, Larry Gillick, Page (NA) Paper number 1912 Abstract: In this paper we shall report on recent progress in acoustic modelling and preprocessing in our Broadcast News transcription system. We have gone back to basics in acoustic modelling, and re-examined some of our standard practices, in particular the use of IMELDA and frequency warping, in the context of the Broadcast News corpus. We shall also report on some preliminary experiments with a generalization of IMELDA, "semi-tied covariances". In combination, these improvements lead to a 3.5% absolute improvement over our eval97 models. We shall also describe our attempts to fix our rather primitive, silence-based preprocessing system, including initial results using a new speaker-change detection algorithm based on Hotelling's T2-test. IC991912.PDF (From Author) IC991912.PDF (Rasterized) TOP Recent Improvements to IBM's Speech Recognition System for Automatic Transcription of Broadcast News Authors: Scott S Chen, Ellen M Eide, Mark J.F. Gales, Ramesh A Gopinath, Dimitri Kanevsky, Peder A Olsen, Page (NA) Paper number 2096 Abstract: We describe recent extensions and improvements to IBM's system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic transcription, 80 hours more than for the 1997 Hub4 evaluation. In addition to improvements obtained in1997 we made a number of changes and algorithmic enhancements. Among these were changing the acoustic vocabulary, reducing the number of phonemes, insertion of short pauses, mixture models consisting of non-gaussian components, pronunciation networks, factor analysis (FACILT) and Bayesian Information Criteria (BIC) applied to choosing the number of components in a gaussian mixture model. The models were combined in a single system using NIST's script voting machine known as rover. IC992096.PDF (From Author) IC992096.PDF (Rasterized) TOP Recent Experiments in Large Vocabulary Conversational Speech Recognition Authors: Jayadev Billa, Thomas Colhurst, Amro El-Jaroudi, Rukmini Iyer, Kristine Ma, Spyros Matsoukas, Carl Quillen, Fred Richardson, Man-Hung Siu, George Zavaliagkos, Herbert Gish, Page (NA) Paper number 2390 Abstract: This paper describes the improvements that resulted in the 1998 Byblos Large Vocabulary Conversational Speech Recognition (LVCSR) System. Salient among these improvements are: improved signal processing, improved Hidden Markov Model (HMM) topology, use of quinphone context, introduction of diagonal speaker adapted training (DSAT), incorporation of variance adaptation in the MLLR framework, improvements in language modeling, increase in lexicon size and combination of multiple systems. These changes resulted in about a 7% absolute reduction in word error rates on a balanced Switchboard/Callhome English test set. IC992390.PDF (From Author) IC992390.PDF (Rasterized) TOP Large Vocabulary Speech Recognition In French Authors: Martine Adda-Decker, Gilles Adda, Jean-Luc S Gauvain, Lori F Lamel, Page (NA) Paper number 2250 Abstract: In this contribution we present some design considerations concerning our large vocabulary continuous speech recognition system in French. The impact of the epoch of the text training material on lexical coverage, language model perplexity and recognition performance on newspaper texts is demonstrated. The effectiveness of larger vocabulary sizes and larger text training corpora for language modeling is investigated. French is a highly inflected language producing large lexical variety and a high homophone rate. About 30% of recognition errors are shown to be due to substitutions between inflected forms of a given root form. When word error rates are analysed as a function of word frequency, a significant increase in the error rate can be measured for frequency ranks above 5000. IC992250.PDF (From Author) IC992250.PDF (Rasterized) TOP The Cambridge University Spoken Document Retrieval System Authors: Sue E Johnson, Pierre Jourlin, Gareth L Moore, Karen Spärck Jones, Philip C Woodland, Page (NA) Paper number 2304 Abstract: This paper describes the spoken document retrieval system that we have been developing and assesses its performance using automatic transcriptions of about 50 hours of broadcast news data. The recognition engine is based on the HTK broadcast news transcription system and the retrieval engine is based on the techniques developed at City University. The retrieval performance over a wide range of speech transcription error rates is presented and a number of recognition error metrics that more accurately reflect the impact of transcription errors on retrieval accuracy are defined and computed. The results demonstrate the importance of high accuracy automatic transcription. The final system is currently being evaluated on the 1998 TREC-7 spoken document retrieval task. IC992304.PDF (From Author) IC992304.PDF (Rasterized) TOP Improvements in Recognition of Conversational Telephone Speech Authors: Barbara Peskin, Michael Newman, Don McAllaster, Venkatesh Nagesha, Hywel B Richards, Dragon Systems UK (U.K.) Steven Wegmann, Melvyn Hunt, Dragon Systems UK (U.K.) Larry Gillick, Page (NA) Paper number 1922 Abstract: This paper describes recent changes in Dragon's speech recognition system which have markedly improved performance on conversational telephone speech. Key changes include: the conversion to modified PLP-based cepstra from mel-cepstra; the replacement of our usual IMELDA transform by a new transform using "semi-tied covariance"; a new multi-pass adaptation protocol; probabilities on alternate pronunciations in the lexicon; the addition of word-boundary tags in our acoustic models and the redistribution of model parameters to build fewer output distributions but with more mixture components per model. IC991922.PDF (From Author) IC991922.PDF (Rasterized) TOP The 1998 HTK System for Transcription of Conversational Telephone Speech Authors: Thomas Hain, Philip C Woodland, Thomas R Niesler, Edward W.D Whittaker, Page (NA) Paper number 2311 Abstract: This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone speech as used in the NIST 1998 Hub5E evaluation. Front-end and language modelling experiments conducted using various training and test sets from both the Switchboard and Callhome English corpora are presented. Our complete system includes reduced bandwidth analysis, side-based cepstral feature normalisation, vocal tract length normalisation (VTLN), triphone and quinphone hidden Markov models (HMMs) built using speaker adaptive training (SAT), maximum likelihood linear regression (MLLR) speaker adaptation and a confidence score based system combination. A detailed description of the complete system together with experimental results for each stage of our multi-pass decoding scheme is presented. The word error rate obtained is almost 20% better than our 1997 system on the development set. IC992311.PDF (From Author) IC992311.PDF (Rasterized) TOP Real-Time Telephone-Based Speech Recognition in the Jupiter Domain Authors: James R Glass, Timothy J Hazen, I. Lee Hetherington, Page (NA) Paper number 2464 Abstract: This paper describes our experiences with developing a real-time telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recognizer vocabulary, pronunciations, language and acoustic models for this system, the new weighted finite-state transducer-based lexical access component, and report on the current performance of the recognizer under several different conditions. We also analyze recognition latency to verify that the system performs in real time. IC992464.PDF (From Author) IC992464.PDF (Rasterized) TOP