ICASSP99 Speech Enhancement

Speech Enhancement
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Subspace State Space Model Identification For Speech Enhancement Authors: Eric J Grivel, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France) Marcel G Gabrea, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France) Mohamed Najim, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France) Page (NA) Paper number 1622 Abstract: This paper deals with Kalman filter-based enhancement of a speech signal contaminated by a white noise, using a single microphone system. Such a problem can be stated as a realization issue in the framework of identification. For such a purpose we propose to identify the state space model by using subspace non-iterative algorithms based on orthogonal projections. Unlike Estimate-Maximize (EM)-based algorithms, this approach provides, in a single iteration from noisy observations, the matrices related to state space model and the covariance matrices that are necessary to perform Kalman filtering. In addition no voice activity detector is required unlike existing methods. Both methods proposed here are compared with classical approaches. IC991622.PDF (From Author) IC991622.PDF (Rasterized) TOP Using AR HMM State-Dependent Filtering for Speech Enhancement Authors: Driss Matrouf, LIMSI-CNRS (France) (France) Jean-Luc S Gauvain, LIMSI-CNRS (france) (France) Page (NA) Paper number 1705 Abstract: In this paper we address the problem of enhancing speech which has been degraded by additive noise. As proposed by Ephraim et~al., autoregressive hidden Markov models (AR-HMM) for the clean speech and an autoregressive Gaussian for the noise are used. The filter applied to a given frame of noisy speech is estimated using the noise model and the autoregressive Gaussian having the highest a posteriori probability given the decoded state sequence. The success of this technique is highly dependent on accurate estimation of the best state sequence. A new strategy combining the use of cepstral-based HMMs, autoregressive HMMs, and a model combination technique, is proposed. The intelligibility of the enhanced speech is indirectly assessed via speech recognition, by comparing performance on noisy speech with compensated models to performance on the enhanced speech with clean-speech models. The results on enhanced speech are as good as our best results obtained with noise compensated models. IC991705.PDF (From Author) IC991705.PDF (Rasterized) TOP Tracking Speech-Presence Uncertainty to Improve Speech Enhancement In Non-Stationary Noise Environments Authors: David Malah, Richard V. Cox, Anthony J Accardi, Page (NA) Paper number 1761 Abstract: Speech enhancement algorithms which are based on estimating the short-time spectral amplitude of the clean speech have better performance when a soft-decision gain modification, depending on the a priori probability of speech absence, is used. In reported works a fixed probability, q, is assumed. Since speech is non-stationary and may not be present in every frequency bin when voiced, we propose a method for estimating distinct values of q for different bins which are tracked in time. The estimation is based on a decision-theoretic approach for setting a threshold in each bin followed by short-time averaging. The estimated q's are used to control both the gain and the update of the estimated noise spectrum during speech presence in a modified MMSE log-spectral amplitude estimator. Subjective tests resulted in higher scores than for the IS-127 standard enhancement algorithm, when pre-processing noisy speech for a coding application. IC991761.PDF (From Author) IC991761.PDF (Rasterized) TOP Adaptive Two-Band Spectral Subtraction with Multi-window Spectral Estimation Authors: Chuang He, George Zweig, Page (NA) Paper number 1809 Abstract: An improved spectral subtraction algorithm for enhancing speech corrupted by additive wideband noise is described. The artifactual noise introduced by spectral subtraction that is perceived as musical noise is 7 dB less than that introduced by the classical spectral subtraction algorithm of Berouti et al. Speech is decomposed into voiced and unvoiced sections. Since voiced speech is primarily stochastic at high frequencies, the voiced speech is high-pass filtered to extract its stochastic component. The cut-off frequency is estimated adaptively. Multi-window spectral estimation is used to estimate the spectrum of stochastically voiced and unvoiced speech, thereby reducing the spectral variance. A low-pass filter is used to extract the deterministic component of voiced speech. Its spectrum is estimated with a single window. Spectral subtraction is performed with the classical algorithm using the estimated spectra. Informal listening tests confirm that the new algorithm creates significantly less musical noise than the classical algorithm. IC991809.PDF (From Author) IC991809.PDF (Rasterized) TOP Speech Enhancement Using Voice Source Models Authors: Anisa Yasmin, Paul W Fieguth, Li Deng, Page (NA) Paper number 1846 Abstract: Autoregressive (AR) models have been shown to be effective models of the human vocal tract during voicing. However the most common model of speech for enhancement purposes, an AR process excited by white noise, fails to capture the periodic nature of voiced speech. Speech synthesis researchers have long recognized this problem and have developed a variety of sophisticated excitation models, however these models have yet to make an impact in speech enhancement. We have chosen one of the most common excitation models, the four-parameter LF model of Fant, Liljencrants and Lin, and applied it to the enhancement of individual voiced phonemes. Comparing the performance of the conventional white-noise-driven AR, an impulse-driven AR, and an AR based on the LF model shows that the LF model yields a substantial improvement, on the order of 1.3 dB. IC991846.PDF (From Author) IC991846.PDF (Rasterized) TOP Adaptive Decorrelation Filtering for Separation of Co-Channel Speech Signals from M > 2 Sources Authors: Kuan-Chieh Yen, University of Illinois at Urbana-Champaign (USA) (USA) Yunxin Zhao, University of Missouri - Columbia (USA) (USA) Page (NA) Paper number 2016 Abstract: The ADF algorithm for separating two signal sources by Weinstein, Feder, and Oppenheim is generalized for separation of co-channel speech signals from more than two sources. The system configuration, its accompanied ADF algorithm, and the choice of adaptation gain are derived. The applicability and limitation of the derived algorithm are also discussed. Experiments were conducted for separation of three speech sources with the acoustic paths measured from an office environment, and the algorithm was shown to improve the average target-to-interference ratio for the three sources by approximately 15 dB. IC992016.PDF (From Author) IC992016.PDF (Rasterized) TOP Audio Signal Noise Reduction Using Multi-resolution Sinusoidal Modeling Authors: David V Anderson, Mark A Clements, Page (NA) Paper number 2052 Abstract: The sinusoidal transform (ST) provides a sparse representation for speech signals by utilizing several psychoacoustic phenomena. It is well suited to applications in signal enhancement because the signal is represented in a parametric manner that is easy to manipulate. The multi--resolution sinusoidal transform (MRST) has the additional advantage that it is both particularly well suited to typical speech signals and well matched to the human auditory system. The currently reported work discusses the removal of noise from a noisy signal by applying an adaptive Wiener filter to the MRST parameters and then conditioning the parameters to eliminate ``musical noise.'' In informal tests MRST based noise reduction was found to reduce background noise significantly better than traditional Wiener filtering and to virtually eliminate the ``musical noise'' often associated with Wiener filtering. IC992052.PDF (From Author) IC992052.PDF (Rasterized) TOP Utilizing Interband Acoustical Information for Modeling Stationary Time-Frequency Regions of Noisy Speech Authors: Chang D Yoo, Korea Telecom (Korea) Page (NA) Paper number 2435 Abstract: A novel enhancement system is developed that exploits the properties of staionary regions localized in both time and frequency. This system selects stationary time-frequency regions and adaptively enhances each region according to its local signal-to-noise ratio while utilizing both the acoustical knowledge of speech and the masking properties of the human auditory system. Each regon is enhanced for maximum noise reduction while minimizing distortion. This paper evaluates the proposed sytem through informal listening tests and some objective measures. IC992435.PDF (From Author) IC992435.PDF (Rasterized) TOP