Speech Enhancement

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Subspace State Space Model Identification For Speech Enhancement

Authors:

Eric J Grivel, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France)
Marcel G Gabrea, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France)
Mohamed Najim, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France)

Page (NA) Paper number 1622

Abstract:

This paper deals with Kalman filter-based enhancement of a speech signal contaminated by a white noise, using a single microphone system. Such a problem can be stated as a realization issue in the framework of identification. For such a purpose we propose to identify the state space model by using subspace non-iterative algorithms based on orthogonal projections. Unlike Estimate-Maximize (EM)-based algorithms, this approach provides, in a single iteration from noisy observations, the matrices related to state space model and the covariance matrices that are necessary to perform Kalman filtering. In addition no voice activity detector is required unlike existing methods. Both methods proposed here are compared with classical approaches.

IC991622.PDF (From Author) IC991622.PDF (Rasterized)

TOP


Using AR HMM State-Dependent Filtering for Speech Enhancement

Authors:

Driss Matrouf, LIMSI-CNRS (France) (France)
Jean-Luc S Gauvain, LIMSI-CNRS (france) (France)

Page (NA) Paper number 1705

Abstract:

In this paper we address the problem of enhancing speech which has been degraded by additive noise. As proposed by Ephraim et~al., autoregressive hidden Markov models (AR-HMM) for the clean speech and an autoregressive Gaussian for the noise are used. The filter applied to a given frame of noisy speech is estimated using the noise model and the autoregressive Gaussian having the highest a posteriori probability given the decoded state sequence. The success of this technique is highly dependent on accurate estimation of the best state sequence. A new strategy combining the use of cepstral-based HMMs, autoregressive HMMs, and a model combination technique, is proposed. The intelligibility of the enhanced speech is indirectly assessed via speech recognition, by comparing performance on noisy speech with compensated models to performance on the enhanced speech with clean-speech models. The results on enhanced speech are as good as our best results obtained with noise compensated models.

IC991705.PDF (From Author) IC991705.PDF (Rasterized)

TOP


Tracking Speech-Presence Uncertainty to Improve Speech Enhancement In Non-Stationary Noise Environments

Authors:

David Malah,
Richard V. Cox,
Anthony J Accardi,

Page (NA) Paper number 1761

Abstract:

Speech enhancement algorithms which are based on estimating the short-time spectral amplitude of the clean speech have better performance when a soft-decision gain modification, depending on the a priori probability of speech absence, is used. In reported works a fixed probability, q, is assumed. Since speech is non-stationary and may not be present in every frequency bin when voiced, we propose a method for estimating distinct values of q for different bins which are tracked in time. The estimation is based on a decision-theoretic approach for setting a threshold in each bin followed by short-time averaging. The estimated q's are used to control both the gain and the update of the estimated noise spectrum during speech presence in a modified MMSE log-spectral amplitude estimator. Subjective tests resulted in higher scores than for the IS-127 standard enhancement algorithm, when pre-processing noisy speech for a coding application.

IC991761.PDF (From Author) IC991761.PDF (Rasterized)

TOP


Adaptive Two-Band Spectral Subtraction with Multi-window Spectral Estimation

Authors:

Chuang He,
George Zweig,

Page (NA) Paper number 1809

Abstract:

An improved spectral subtraction algorithm for enhancing speech corrupted by additive wideband noise is described. The artifactual noise introduced by spectral subtraction that is perceived as musical noise is 7 dB less than that introduced by the classical spectral subtraction algorithm of Berouti et al. Speech is decomposed into voiced and unvoiced sections. Since voiced speech is primarily stochastic at high frequencies, the voiced speech is high-pass filtered to extract its stochastic component. The cut-off frequency is estimated adaptively. Multi-window spectral estimation is used to estimate the spectrum of stochastically voiced and unvoiced speech, thereby reducing the spectral variance. A low-pass filter is used to extract the deterministic component of voiced speech. Its spectrum is estimated with a single window. Spectral subtraction is performed with the classical algorithm using the estimated spectra. Informal listening tests confirm that the new algorithm creates significantly less musical noise than the classical algorithm.

IC991809.PDF (From Author) IC991809.PDF (Rasterized)

TOP


Speech Enhancement Using Voice Source Models

Authors:

Anisa Yasmin,
Paul W Fieguth,
Li Deng,

Page (NA) Paper number 1846

Abstract:

Autoregressive (AR) models have been shown to be effective models of the human vocal tract during voicing. However the most common model of speech for enhancement purposes, an AR process excited by white noise, fails to capture the periodic nature of voiced speech. Speech synthesis researchers have long recognized this problem and have developed a variety of sophisticated excitation models, however these models have yet to make an impact in speech enhancement. We have chosen one of the most common excitation models, the four-parameter LF model of Fant, Liljencrants and Lin, and applied it to the enhancement of individual voiced phonemes. Comparing the performance of the conventional white-noise-driven AR, an impulse-driven AR, and an AR based on the LF model shows that the LF model yields a substantial improvement, on the order of 1.3 dB.

IC991846.PDF (From Author) IC991846.PDF (Rasterized)

TOP


Adaptive Decorrelation Filtering for Separation of Co-Channel Speech Signals from M > 2 Sources

Authors:

Kuan-Chieh Yen, University of Illinois at Urbana-Champaign (USA) (USA)
Yunxin Zhao, University of Missouri - Columbia (USA) (USA)

Page (NA) Paper number 2016

Abstract:

The ADF algorithm for separating two signal sources by Weinstein, Feder, and Oppenheim is generalized for separation of co-channel speech signals from more than two sources. The system configuration, its accompanied ADF algorithm, and the choice of adaptation gain are derived. The applicability and limitation of the derived algorithm are also discussed. Experiments were conducted for separation of three speech sources with the acoustic paths measured from an office environment, and the algorithm was shown to improve the average target-to-interference ratio for the three sources by approximately 15 dB.

IC992016.PDF (From Author) IC992016.PDF (Rasterized)

TOP


Audio Signal Noise Reduction Using Multi-resolution Sinusoidal Modeling

Authors:

David V Anderson,
Mark A Clements,

Page (NA) Paper number 2052

Abstract:

The sinusoidal transform (ST) provides a sparse representation for speech signals by utilizing several psychoacoustic phenomena. It is well suited to applications in signal enhancement because the signal is represented in a parametric manner that is easy to manipulate. The multi--resolution sinusoidal transform (MRST) has the additional advantage that it is both particularly well suited to typical speech signals and well matched to the human auditory system. The currently reported work discusses the removal of noise from a noisy signal by applying an adaptive Wiener filter to the MRST parameters and then conditioning the parameters to eliminate ``musical noise.'' In informal tests MRST based noise reduction was found to reduce background noise significantly better than traditional Wiener filtering and to virtually eliminate the ``musical noise'' often associated with Wiener filtering.

IC992052.PDF (From Author) IC992052.PDF (Rasterized)

TOP


Utilizing Interband Acoustical Information for Modeling Stationary Time-Frequency Regions of Noisy Speech

Authors:

Chang D Yoo, Korea Telecom (Korea)

Page (NA) Paper number 2435

Abstract:

A novel enhancement system is developed that exploits the properties of staionary regions localized in both time and frequency. This system selects stationary time-frequency regions and adaptively enhances each region according to its local signal-to-noise ratio while utilizing both the acoustical knowledge of speech and the masking properties of the human auditory system. Each regon is enhanced for maximum noise reduction while minimizing distortion. This paper evaluates the proposed sytem through informal listening tests and some objective measures.

IC992435.PDF (From Author) IC992435.PDF (Rasterized)

TOP