SPEECH ENHANCEMENT & NOISE REDUCTION

Chair: John H.L. Hansen, Duke University (USA)

Home

Speech Enhancement Based on Masking Properties of the Auditory System

Authors:

Nathalie Virag, Swiss Federal Institute of Technology (SWITZERLAND)

Volume 1, Page 796

Abstract:

This paper addresses the problem of the intelligibility enhancement of speech corrupted by additive background noise in a single channel system. The proposed algorithm uses a criterion based on the human perception. It is a variation of the well-known spectral subtraction method which is attractive because of its simplicity, but introduces an unnatural and unpleasant residual noise. The proposed approach incorporates in this method considerations about noise masking of the auditory system. It succeeds in finding the best tradeoff between noise reduction and speech distortion in a perceptual sense. Simulations show perceptually very satisfactory results and objective measures indicate a quality improvement. The speech processed with this new algorithm sounds more pleasant to a human listener than those obtained by the classical methods. This shows the relevance to incorporate perceptual aspects in the enhancement process.

300dpi TIFF Images of pages:

796 797 798 799

Acrobat PDF file of whole paper:

ic950796.pdf

TOP

Optimizing Speech Enhancement by Exploiting Masking Properties of the Human Ear

Authors:

A. Akbari Azirani, Universite de Rennes I (FRANCE)
R. Le Bouquin Jeanne, Universite de Rennes I (FRANCE)
G. Faucon, Universite de Rennes I (FRANCE)

Volume 1, Page 800

Abstract:

The problem of speech enhancement and mainly noise reduction in speech remains a key-point of hand-free telecommunications. A great number of techniques have been already put forward and for a few years an auditory model has been investigated in noise reduction. In this paper a new approach for enhancing a speech signal degraded by uncorrelated stationary additive noise is developed. In this approach the simultaneous masking effect of the human ear is exploited. Two states noise masked/noise unmasked are derived from a noise masking threshold computed using a rough estimate of the speech signal. Then a speech signal estimator is proposed as a weighted sum of the individual estimators in each state. The gain in the signal to noise ratio (SNR) and a distortion measure indicate some improvement in real noise conditions. Subjectively this improvement is noticeable only at high input SNRs.

300dpi TIFF Images of pages:

800 801 802 803

Acrobat PDF file of whole paper:

ic950800.pdf

TOP

A Spectrally-Based Signal Subspace Approach for Speech Enhancement

Authors:

Yariv Ephraim, George Mason University (USA)
Harry L. VanTrees, George Mason University (USA)

Volume 1, Page 804

Abstract:

The signal subspace approach for enhancing speech signals degraded by uncorrelated additive noise is studied. The underlying principle is to decompose the vector space of the noisy signal into a signal plus noise subspace and a noise subspace. Enhancement is performed by removing the noise subspace and estimating the clean signal from the remaining signal subspace. The decomposition can theoretically be performed by applying the Karhunen-Loeve transform to the noisy signal. Linear estimation of the clean signal is performed using a perceptually meaningful estimation criterion. The estimator is designed by minimizing signal distortion for a fixed desired spectrum of the residual noise. This criterion enables masking of the residual noise by the speech signal. The filter is implemented as a gain function which modifies the KLT components corresponding to the signal subspace. The gain function is solely dependent on the desired spectrum of the residual noise. Listening tests indicate that 14 out of 16 listeners strongly preferred the proposed approach over the spectral subtraction approach.

300dpi TIFF Images of pages:

804 805 806 807

Acrobat PDF file of whole paper:

ic950804.pdf

TOP

Real-Time Implementation of HMM-Based MMSE Algorithm for Speech Enhancement in Hearing Aid Applications

Authors:

H. Sheikhzadeh, Univerisity of Waterloo
R.L. Brennan, Unitron Industries Ltd.
H. Sameti, University of Waterloo (CANADA)

Volume 1, Page 808

Abstract:

In this paper we describe our recent work on real-time implementation of a state-of- the-art HMM-based MMSE speech enhancement algorithm, where our earlier published algorithm has been approximated, optimized, and simplified. The key innovations enabling the enhancement system to run in real-time are: 1) algorithm for automatic selection of the noise model, 2) pruning the MMSE forward calculations, 3) new pause detection method operative for the SNR down to 0 dB, and 4) task partitioning of the entire system, all developed from our more recent work. A preliminary version of the real-time enhancement system is simulated on IBM-PC 486DX2/66 and at the same time implemented on a DSP platform based on dual TMS320C30 DSP chips using single precision floating point arithmetic.

300dpi TIFF Images of pages:

808 809 810 811

Acrobat PDF file of whole paper:

ic950808.pdf

TOP

New Methods for Adaptive Noise Suppression

Authors:

Levent Arslan, Texas Instruments (USA)
Alan McCree, Texas Instruments (USA)
Vishu Viswanathan, Texas Instruments (USA)

Volume 1, Page 812

Abstract:

We propose three new adaptive noise suppression algorithms for enhancing noise-corrupted speech: smoothed spectral subtraction (SSS), vector quantization of line spectral frequencies (VQ-LSF), and modified Wiener filtering (MWF). SSS is an improved version of the well-known spectral subtraction algorithm, while the other two methods are based on generalized Wiener filtering. We have compared these three algorithms with each other and with spectral subtraction on both simulated noise and actual car noise. All three proposed methods perform substantially better than spectral subtraction, primarily because of the absence of any musical noise artifacts in the processed speech. Listening tests showed preference for MWF and SSS over VQ-LSF. Also, MWF provides a much higher mean opinion score (MOS) than does spectral subtraction. Finally, VQ-LSF provides a relatively good spectral match to the clean speech, and may, therefore, be better suited for speech recognition.

300dpi TIFF Images of pages:

812 813 814 815

Acrobat PDF file of whole paper:

ic950812.pdf

TOP

Single-Sensor Speech Enhancement Using a Soft-Decision/Variable Attenuation Algorithm

Authors:

E. Bryan George, Lockheed Sanders Inc. (USA)

Volume 1, Page 816

Abstract:

This paper presents an algorithm for single-sensor enhancment of speech corrupted by additive random noise, based on soft-decision and statistical signal processing concepts and incorporating fully automatic noise estimation/tracking algorithms. The Soft-Decision/Variable Attenuation (SDVA) algorithm uses a compressive noise reduction model within the framework of short-time Fourier processing. The SDVA algorithm is fast, effective and robust, and has been applied in realistic RF and telephone environments.

300dpi TIFF Images of pages:

816 817 818 819

Acrobat PDF file of whole paper:

ic950816.pdf

TOP

Speech Enhancement Using a Ternary-Decision Based Filter

Authors:

T.S. Sun, Martin Marietta Laboratories (USA)
S. Nandkumar, Martin Marietta Laboratories (USA)
J. Carmody, Martin Marietta Laboratories (USA)
J. Rothweiler, Martin Marietta Laboratories (USA)
A. Goldschen, Martin Marietta Laboratories (USA)
N. Russell, Martin Marietta Laboratories (USA)
S. Mpasi, Martin Marietta Laboratories (USA)
P. Green, Martin Marietta Laboratories (USA)

Volume 1, Page 820

Abstract:

A new speech enhancement scheme based on a generalized Wiener filter formulation is proposed. A ternary-valued parameter is derived empirically based on the likelihood of the input signal vector being classified as speech. This parameter controls the Wiener filter coefficient in order to obtain an improved speech spectral estimate. This ternary-decision concept renders a logical compromise between the simple, hard, binary speech/noise decision filtering and the elaborate soft-decision filtering approaches in terms of practicality and performance. An important feature in our scheme is that we exploit the interframe spectral relationship to reinforce the assessment of the likelihood of weak speech components. This feature prevents many weak formants from being disproportionally attenuated as in most previous schemes. Other important features of our scheme include a novel speech/noise classifier and a robust noise median amplitude tracker, both of which make the estimate of noise spectrum more reliable. A preliminary evaluation of this new scheme is reported here.

300dpi TIFF Images of pages:

820 821 822 823

Acrobat PDF file of whole paper:

ic950820.pdf

TOP

Signal Modeling Enhancements for Automatic Speech Recognition

Authors:

Zaki B. Nossair, Old Dominion University (USA)
Peter L. Silsbee, Old Dominion University (USA)
Stephen A. Zahorian, Old Dominion University (USA)

Volume 1, Page 824

Abstract:

Experiments in modeling speech signals for phoneme classification are described. Enhancements to standard speech processing methods include basis vector representations of dynamic feature trajectories, morphological smoothing (dilation) of spectral features, and the use of many closely spaced, short analysis windows. Results are reported from experiments using the TIMIT database of up to 71.0% correct classification of 16 presegmented vowels in a noise-free environment, and 54.5% correct classification in a 10 dB signal-to-noise ratio environment.

300dpi TIFF Images of pages:

824 825 826 827

Acrobat PDF file of whole paper:

ic950824.pdf

TOP

Co-Channel Speaker Separation

Authors:

David P. Morgan, Texas Instruments
E. Bryan George, Texas Instruments
Leonard T. Lee, Lockheed Sanders Inc.
Stephen M. Kay, University of Rhode Island (USA)

Volume 1, Page 828

Abstract:

This paper describes a system for the automatic separation of two-talker co-channel speech. This system is based on a frame-by-frame speaker separation algorithm that exploits a pitch estimate of the stronger talker derived from the co-channel signal. The concept underlying this approach is to recover the stronger talker's speech by enhancing harmonic frequencies and formants given a multi-resolution pitch estimate. The weaker talker's speech is obtained from the residual signal created when the harmonics and formants of the stronger talker are suppressed. A maximum likelihood speaker assignment algorithm is used to place the recovered frames from the target and interfering talkers in separate channels. The system has been tested at target-to-interferer ratios (TIRs) from -18 to 18 dB with human listening tests, and with machine-based tests employing a keyword spotting system on the Switchboard Corpus for target talkers at 6, 12, and 18 dB TIR.

300dpi TIFF Images of pages:

828 829 830 831

Acrobat PDF file of whole paper:

ic950828.pdf

TOP

Speech Enhancement Based on the Generalized Dual Excitation Model with Adaptive Analysis Window

Authors:

Chang D. Yoo, Massachusetts Institute of Technology (USA)
Jae S. Lim, Massachusetts Institute of Technology (USA)

Volume 1, Page 832

Abstract:

In this paper, we describe a Generalized Dual Excitation (GDE) speech model that is broader and more accurate in its characterization than the Dual Excitation (DE) model in that it takes into account pitch variations. This model, together with an analysis window whose length adaptively varies according to the changing characteristics of speech, forms the backbone of a new speech enhancement system. Informal comparisons of the GDE system with the traditional systems have shown a clear preference for the former with a nominal SNR improvement of 1/2dB to 1dB above the traditional methods.

300dpi TIFF Images of pages:

832 833 834 835

Acrobat PDF file of whole paper:

ic950832.pdf