Speech Analysis and Enhancement

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Template-Driven Generation Of Prosodic Information For Chinese Concatenative Synthesis

Authors:

Chung-Hsien Wu, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (Taiwan)
Jau-Hung Chen, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (Taiwan)

Page (NA) Paper number 1360

Abstract:

In this paper, a template-driven generation of prosodic information is proposed for Chinese text-to-speech conversion. A set of monosyllable-based synthesis units is selected from a large continuous speech database. The speech database is employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a sentence. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results for the TTS conversion system showed that synthesized prosodic features quite resembled their original counterparts for most syllables in the inside test. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.

IC991360.PDF (From Author) IC991360.PDF (Rasterized)

TOP


Speech Enhancement Using Nonlinear Microphone Array with Complementary Beamforming

Authors:

Hiroshi Saruwatari,
Shoji Kajita,
Kazuya Takeda,
Fumitada Itakura,

Page (NA) Paper number 1669

Abstract:

This paper describes an improved spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. In addition, the design of the optimization algorithm for the directivity pattern is also described. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations. In comparison with the optimized conventional delay-and-sum array, it is shown that the proposed array improves the signal-to-noise ratio of degraded speech by about 2 dB and performs about 10% better in word recognition rates under heavy noisy conditions.

IC991669.PDF (From Author) IC991669.PDF (Rasterized)

TOP


A Multivariate Speech Activity Detector Based on the Syllable Rate

Authors:

David C Smith,
Jeffrey Townsend,
Douglas J Nelson,
Dan Richman,

Page (NA) Paper number 1756

Abstract:

Computationally efficient speech extraction algorithms have significant potential economic benefit, by automating an extremely tedious manual process. Previously, algorithms which discriminate between speech and one specific other signal type have been developed, and often fail when the specific non-speech signal is replaced by a different signal type. Moreover, several such signal specific discriminators have been combined with predictable negative results. When the number of discriminating features is large, compression methods such as Principal Components have been applied to reduce dimension, even though information may be lost in the process. In this paper, graphical tools are applied to determine a set of features which produce excellent speech vs. non-speech clustering. This cluster structure provides the basis for a general speech vs. non-speech discriminator, which significantly outperforms the TALKATIVE speech extraction algorithm.

IC991756.PDF (From Author) IC991756.PDF (Rasterized)

TOP


Discriminating Speakers With Vocal Nodules Using Aerodynamic And Acoustic Features

Authors:

Jeff Kuo,
Eva B. Holmberg,
Robert E. Hillman,

Page (NA) Paper number 1789

Abstract:

This paper demonstrates that linear discriminant analysis using aerodynamic and acoustic features is effective in discriminating speakers with vocal-fold nodules from normal speakers. Simultaneous aerodynamic and acoustic measurements of vocal function were taken of 14 women with bilateral vocal-fold nodules and 12 women with normal voice production. Features were extracted from the glottal airflow waveform and peaks in the acoustic spectrum for the vowel /æ/. Results show that the subglottal pressure, air flow, and open quotient are increased in the nodules group. Estimated first-formant bandwidths are increased, but result in minimal change in the first-formant amplitudes. There is no appreciable decrease in high frequency energy. Speakers with nodules may be compensating for the nodules by increasing the subglottal pressure, resulting in relatively good acoustics but increased air flows. The two best features for discrimination are open quotient and subglottal pressure.

IC991789.PDF (From Author) IC991789.PDF (Rasterized)

TOP


Enhancement of Esophageal Speech Using Formant Synthesis

Authors:

Kenji Matsui,
Noriyo Hara,

Page (NA) Paper number 1831

Abstract:

The feasibility of using the formant analysis-synthesis approach to replace the voicing sources of esophageal speech was explored. The voicing sources were generated by using inverse-filtered signals extracted from normal speakers. Pitch extraction was tested with various pitch extraction methods, then simple auto-correlation method was chosen. Special hardware unit was designed to perform the analysis-synthesis process in real-time. Results of a subjective test showed that the synthesized speech was significantly improved.

IC991831.PDF (From Author) IC991831.PDF (Rasterized)

TOP


Development of Rules for Controlling the HLsyn Speech Synthesizer

Authors:

Helen M Hanson,
Richard S McGowan,
Kenneth N Stevens,
Robert E Beaudoin,

Page (NA) Paper number 2179

Abstract:

In this paper we describe the development of rules to drive a quasi-articulatory speech synthesizer, HLsyn. HLsyn has 13 parameters, which are mapped to the parameters of a formant synthesizer. Its small number of parameters combined with the computational simplicity of a formant synthesizer make it a good basis for a text-to-speech system. An overview of the rule-driven system, called VHLsyn, is presented. The system assumes a phonetic string as input, and produces HLsyn parameter tracks as output. These parameter tracks are then used by HLsyn to produce synthesized speech. Recent work to improve the synthesis of consonants and suprasegmental effects is described, and is shown to improve the quality of the output speech. The improvements include refinement of release characteristics of stop consonants, methods for control of vocal-fold parameters for voiced and voiceless obstruent consonants, and rules for timing and intonation.

IC992179.PDF (From Author) IC992179.PDF (Rasterized)

TOP


On The Characteristics And Effects Of Loudness During Utterance Production In Continuous Speech Recognition

Authors:

Daniel Tapias,
Carlos García,
Christophe Cazassus,

Page (NA) Paper number 2302

Abstract:

We have checked out that, in speech recognition based telephone applications,the loudness with which the speech signal is produced is a source of degradation of the word accuracy if it is lower or higher than normal. For this reason, we have carried out a research work which has reached three goals: (a) get a better understanding of the Speech Production Loudness (SPL) phenomenon, (b) find out the parameters of the speech recognizer that are the most affected by loudness variations, and (c) compute the effects of SPL and whispery speech in Large Vocabulary Continuous Speech Recognition (LVCSR). In this paper we report the results of this study for three different loudnesses (low, normal and high) and whispery speech. We also report the word accuracy degradation of a continuous speech recognition system when the speech production loudness is different than normal as well as the degradation for whispery speech. The study has been done using the TRESVOL Spanish database, that was designed to study, evaluate and compensate the effect of loudness and whispery speech in LVCSR systems.

IC992302.PDF (From Author) IC992302.PDF (Rasterized)

TOP


A Multi-Channel Speech/Silence Detector Based on Time Delay Estimation and Fuzzy Classification

Authors:

Francesco Beritelli,
Salvatore Casale,
Alfredo Cavallaro,

Page (NA) Paper number 2363

Abstract:

Discontinuous transmission based on speech/pause detection represents a valid solution to improve the spectral efficiency of new-generation wireless communication systems. In this context, robust Voice Activity Detection (VAD) algorithms are required, as traditional solutions present a high misclassification rate in the presence of the background noise typical of mobile environments. The Fuzzy Voice Activity Detector (FVAD) recently proposed in [1], shows that a valid alternative to deal with the problem of activity decision is to use methodologies like fuzzy logic. In this paper we propose a multichannel approach to activity detection using both fuzzy logic and time delay estimation. Objective and subjective tests confirm a significant improvement over traditional methods, above all in terms of a reduction in activity increase for non stationary noise.

IC992363.PDF (Scanned)

TOP


Noise Suppression Using A Time-Varying, Analysis/Synthesis Gammachirp Filterbank

Authors:

Toshio Irino,

Page (NA) Paper number 1837

Abstract:

Spectral subtraction has been cited most often as a noise suppression method for speech signals in steady background noise, because it is basically a non-parametric method and simple enough to implement for various applications using FFT. It has also been well known, however, that spectral subtraction produces so called "musical noise" in synthetic sounds. Since such musical noise, even at low levels, can often bother humans in speech perception, spectral subtraction has not been very successful in signal processing applications for human listeners. To suppress noise without producing musical noise, an alternative method has been developed using a time-varying, analysis/synthesis gammachirp filterbank; this was initially proposed as an auditory filterbank. The present method achieves about the same SNR improvement as spectral subtraction when using the same information on the non-speech interval. Moreover, the synthetic sounds only contain steady white-like noise at reduced levels when the original noise is white. This method is, therefore, advantageous in various applications for human listeners.

IC991837.PDF (From Author) IC991837.PDF (Rasterized)

TOP


Experimental Comparison of Signal Subspace Based Noise Reduction Methods

Authors:

Peter S. K. Hansen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark)
Per Christian Hansen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark)
Steffen Duus Hansen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark)
John Aasted Sørensen, Department of Mathematical Modelling, Technical University of Denmark, Building 321, DK-2800 Lyngby, Denmark (Denmark)

Page (NA) Paper number 1863

Abstract:

In this paper the signal subspace approach for nonparametric speech enhancement is considered. Several algorithms have been proposed in the literature but only partly analyzed. Here, the different algorithms are compared, and the emphasis is put onto the limiting factors and practical behavior of the estimators. Experimental results show that the signal subspace approach may lead to a significant enhancement of the signal to noise ratio of the output signal.

IC991863.PDF (From Author) IC991863.PDF (Rasterized)

TOP