ICASSP99 Speech Analysis and Enhancement

Speech Analysis and Enhancement
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Template-Driven Generation Of Prosodic Information For Chinese Concatenative Synthesis Authors: Chung-Hsien Wu, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (Taiwan) Jau-Hung Chen, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (Taiwan) Page (NA) Paper number 1360 Abstract: In this paper, a template-driven generation of prosodic information is proposed for Chinese text-to-speech conversion. A set of monosyllable-based synthesis units is selected from a large continuous speech database. The speech database is employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a sentence. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results for the TTS conversion system showed that synthesized prosodic features quite resembled their original counterparts for most syllables in the inside test. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches. IC991360.PDF (From Author) IC991360.PDF (Rasterized) TOP Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming Authors: Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura, Page (NA) Paper number 1669 Abstract: This paper describes an improved spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. In addition, the design of the optimization algorithm for the directivity pattern is also described. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations. In comparison with the optimized conventional delay-and-sum array, it is shown that the proposed array improves the signal-to-noise ratio of degraded speech by about 2 dB and performs about 10% better in word recognition rates under heavy noisy conditions. IC991669.PDF (From Author) IC991669.PDF (Rasterized) TOP A Multivariate Speech Activity Detector Based on the Syllable Rate Authors: David C Smith, Jeffrey Townsend, Douglas J Nelson, Dan Richman, Page (NA) Paper number 1756 Abstract: Computationally efficient speech extraction algorithms have significant potential economic benefit, by automating an extremely tedious manual process. Previously, algorithms which discriminate between speech and one specific other signal type have been developed, and often fail when the specific non-speech signal is replaced by a different signal type. Moreover, several such signal specific discriminators have been combined with predictable negative results. When the number of discriminating features is large, compression methods such as Principal Components have been applied to reduce dimension, even though information may be lost in the process. In this paper, graphical tools are applied to determine a set of features which produce excellent speech vs. non-speech clustering. This cluster structure provides the basis for a general speech vs. non-speech discriminator, which significantly outperforms the TALKATIVE speech extraction algorithm. IC991756.PDF (From Author) IC991756.PDF (Rasterized) TOP Discriminating Speakers With Vocal Nodules Using Aerodynamic And Acoustic Features Authors: Jeff Kuo, Eva B. Holmberg, Robert E. Hillman, Page (NA) Paper number 1789 Abstract: This paper demonstrates that linear discriminant analysis using aerodynamic and acoustic features is effective in discriminating speakers with vocal-fold nodules from normal speakers. Simultaneous aerodynamic and acoustic measurements of vocal function were taken of 14 women with bilateral vocal-fold nodules and 12 women with normal voice production. Features were extracted from the glottal airflow waveform and peaks in the acoustic spectrum for the vowel /æ/. Results show that the subglottal pressure, air flow, and open quotient are increased in the nodules group. Estimated first-formant bandwidths are increased, but result in minimal change in the first-formant amplitudes. There is no appreciable decrease in high frequency energy. Speakers with nodules may be compensating for the nodules by increasing the subglottal pressure, resulting in relatively good acoustics but increased air flows. The two best features for discrimination are open quotient and subglottal pressure. IC991789.PDF (From Author) IC991789.PDF (Rasterized) TOP Enhancement of Esophageal Speech Using Formant Synthesis Authors: Kenji Matsui, Noriyo Hara, Page (NA) Paper number 1831 Abstract: The feasibility of using the formant analysis-synthesis approach to replace the voicing sources of esophageal speech was explored. The voicing sources were generated by using inverse-filtered signals extracted from normal speakers. Pitch extraction was tested with various pitch extraction methods, then simple auto-correlation method was chosen. Special hardware unit was designed to perform the analysis-synthesis process in real-time. Results of a subjective test showed that the synthesized speech was significantly improved. IC991831.PDF (From Author) IC991831.PDF (Rasterized) TOP Development of Rules for Controlling the HLsyn Speech Synthesizer Authors: Helen M Hanson, Richard S McGowan, Kenneth N Stevens, Robert E Beaudoin, Page (NA) Paper number 2179 Abstract: In this paper we describe the development of rules to drive a quasi-articulatory speech synthesizer, HLsyn. HLsyn has 13 parameters, which are mapped to the parameters of a formant synthesizer. Its small number of parameters combined with the computational simplicity of a formant synthesizer make it a good basis for a text-to-speech system. An overview of the rule-driven system, called VHLsyn, is presented. The system assumes a phonetic string as input, and produces HLsyn parameter tracks as output. These parameter tracks are then used by HLsyn to produce synthesized speech. Recent work to improve the synthesis of consonants and suprasegmental effects is described, and is shown to improve the quality of the output speech. The improvements include refinement of release characteristics of stop consonants, methods for control of vocal-fold parameters for voiced and voiceless obstruent consonants, and rules for timing and intonation. IC992179.PDF (From Author) IC992179.PDF (Rasterized) TOP On The Characteristics And Effects Of Loudness During Utterance Production In Continuous Speech Recognition Authors: Daniel Tapias, Carlos García, Christophe Cazassus, Page (NA) Paper number 2302 Abstract: We have checked out that, in speech recognition based telephone applications,the loudness with which the speech signal is produced is a source of degradation of the word accuracy if it is lower or higher than normal. For this reason, we have carried out a research work which has reached three goals: (a) get a better understanding of the Speech Production Loudness (SPL) phenomenon, (b) find out the parameters of the speech recognizer that are the most affected by loudness variations, and (c) compute the effects of SPL and whispery speech in Large Vocabulary Continuous Speech Recognition (LVCSR). In this paper we report the results of this study for three different loudnesses (low, normal and high) and whispery speech. We also report the word accuracy degradation of a continuous speech recognition system when the speech production loudness is different than normal as well as the degradation for whispery speech. The study has been done using the TRESVOL Spanish database, that was designed to study, evaluate and compensate the effect of loudness and whispery speech in LVCSR systems. IC992302.PDF (From Author) IC992302.PDF (Rasterized) TOP A Multi-Channel Speech/Silence Detector Based on Time Delay Estimation and Fuzzy Classification Authors: Francesco Beritelli, Salvatore Casale, Alfredo Cavallaro, Page (NA) Paper number 2363 Abstract: Discontinuous transmission based on speech/pause detection represents a valid solution to improve the spectral efficiency of new-generation wireless communication systems. In this context, robust Voice Activity Detection (VAD) algorithms are required, as traditional solutions present a high misclassification rate in the presence of the background noise typical of mobile environments. The Fuzzy Voice Activity Detector (FVAD) recently proposed in [1], shows that a valid alternative to deal with the problem of activity decision is to use methodologies like fuzzy logic. In this paper we propose a multichannel approach to activity detection using both fuzzy logic and time delay estimation. Objective and subjective tests confirm a significant improvement over traditional methods, above all in terms of a reduction in activity increase for non stationary noise. IC992363.PDF (Scanned) TOP Noise Suppression Using A Time-Varying, Analysis/Synthesis Gammachirp Filterbank Authors: Toshio Irino, Page (NA) Paper number 1837 Abstract: Spectral subtraction has been cited most often as a noise suppression method for speech signals in steady background noise, because it is basically a non-parametric method and simple enough to implement for various applications using FFT. It has also been well known, however, that spectral subtraction produces so called "musical noise" in synthetic sounds. Since such musical noise, even at low levels, can often bother humans in speech perception, spectral subtraction has not been very successful in signal processing applications for human listeners. To suppress noise without producing musical noise, an alternative method has been developed using a time-varying, analysis/synthesis gammachirp filterbank; this was initially proposed as an auditory filterbank. The present method achieves about the same SNR improvement as spectral subtraction when using the same information on the non-speech interval. Moreover, the synthetic sounds only contain steady white-like noise at reduced levels when the original noise is white. This method is, therefore, advantageous in various applications for human listeners. IC991837.PDF (From Author) IC991837.PDF (Rasterized) TOP Experimental Comparison of Signal Subspace Based Noise Reduction Methods Authors: Peter S. K. Hansen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark) Per Christian Hansen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark) Steffen Duus Hansen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark) John Aasted Sørensen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark) Page (NA) Paper number 1863 Abstract: In this paper the signal subspace approach for nonparametric speech enhancement is considered. Several algorithms have been proposed in the literature but only partly analyzed. Here, the different algorithms are compared, and the emphasis is put onto the limiting factors and practical behavior of the estimators. Experimental results show that the signal subspace approach may lead to a significant enhancement of the signal to noise ratio of the output signal. IC991863.PDF (From Author) IC991863.PDF (Rasterized) TOP