Authors:
Eric J Grivel, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France)
Marcel G Gabrea, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France)
Mohamed Najim, Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France. (France)
Page (NA) Paper number 1622
Abstract:
This paper deals with Kalman filter-based enhancement of a speech signal
contaminated by a white noise, using a single microphone system. Such
a problem can be stated as a realization issue in the framework of
identification. For such a purpose we propose to identify the state
space model by using subspace non-iterative algorithms based on orthogonal
projections. Unlike Estimate-Maximize (EM)-based algorithms, this approach
provides, in a single iteration from noisy observations, the matrices
related to state space model and the covariance matrices that are necessary
to perform Kalman filtering. In addition no voice activity detector
is required unlike existing methods. Both methods proposed here are
compared with classical approaches.
Authors:
Driss Matrouf, LIMSI-CNRS (France) (France)
Jean-Luc S Gauvain, LIMSI-CNRS (france) (France)
Page (NA) Paper number 1705
Abstract:
In this paper we address the problem of enhancing speech which has
been degraded by additive noise. As proposed by Ephraim et~al., autoregressive
hidden Markov models (AR-HMM) for the clean speech and an autoregressive
Gaussian for the noise are used. The filter applied to a given frame
of noisy speech is estimated using the noise model and the autoregressive
Gaussian having the highest a posteriori probability given the decoded
state sequence. The success of this technique is highly dependent on
accurate estimation of the best state sequence. A new strategy combining
the use of cepstral-based HMMs, autoregressive HMMs, and a model combination
technique, is proposed. The intelligibility of the enhanced speech
is indirectly assessed via speech recognition, by comparing performance
on noisy speech with compensated models to performance on the enhanced
speech with clean-speech models. The results on enhanced speech are
as good as our best results obtained with noise compensated models.
Authors:
David Malah,
Richard V. Cox,
Anthony J Accardi,
Page (NA) Paper number 1761
Abstract:
Speech enhancement algorithms which are based on estimating the short-time
spectral amplitude of the clean speech have better performance when
a soft-decision gain modification, depending on the a priori probability
of speech absence, is used. In reported works a fixed probability,
q, is assumed. Since speech is non-stationary and may not be present
in every frequency bin when voiced, we propose a method for estimating
distinct values of q for different bins which are tracked in time.
The estimation is based on a decision-theoretic approach for setting
a threshold in each bin followed by short-time averaging. The estimated
q's are used to control both the gain and the update of the estimated
noise spectrum during speech presence in a modified MMSE log-spectral
amplitude estimator. Subjective tests resulted in higher scores than
for the IS-127 standard enhancement algorithm, when pre-processing
noisy speech for a coding application.
Authors:
Chuang He,
George Zweig,
Page (NA) Paper number 1809
Abstract:
An improved spectral subtraction algorithm for enhancing speech corrupted
by additive wideband noise is described. The artifactual noise introduced
by spectral subtraction that is perceived as musical noise is 7 dB
less than that introduced by the classical spectral subtraction algorithm
of Berouti et al. Speech is decomposed into voiced and unvoiced sections.
Since voiced speech is primarily stochastic at high frequencies, the
voiced speech is high-pass filtered to extract its stochastic component.
The cut-off frequency is estimated adaptively. Multi-window spectral
estimation is used to estimate the spectrum of stochastically voiced
and unvoiced speech, thereby reducing the spectral variance. A low-pass
filter is used to extract the deterministic component of voiced speech.
Its spectrum is estimated with a single window. Spectral subtraction
is performed with the classical algorithm using the estimated spectra.
Informal listening tests confirm that the new algorithm creates significantly
less musical noise than the classical algorithm.
Authors:
Anisa Yasmin,
Paul W Fieguth,
Li Deng,
Page (NA) Paper number 1846
Abstract:
Autoregressive (AR) models have been shown to be effective models of
the human vocal tract during voicing. However the most common model
of speech for enhancement purposes, an AR process excited by white
noise, fails to capture the periodic nature of voiced speech. Speech
synthesis researchers have long recognized this problem and have developed
a variety of sophisticated excitation models, however these models
have yet to make an impact in speech enhancement. We have chosen one
of the most common excitation models, the four-parameter LF model of
Fant, Liljencrants and Lin, and applied it to the enhancement of individual
voiced phonemes. Comparing the performance of the conventional white-noise-driven
AR, an impulse-driven AR, and an AR based on the LF model shows that
the LF model yields a substantial improvement, on the order of 1.3
dB.
Authors:
Kuan-Chieh Yen, University of Illinois at Urbana-Champaign (USA) (USA)
Yunxin Zhao, University of Missouri - Columbia (USA) (USA)
Page (NA) Paper number 2016
Abstract:
The ADF algorithm for separating two signal sources by Weinstein, Feder,
and Oppenheim is generalized for separation of co-channel speech signals
from more than two sources. The system configuration, its accompanied
ADF algorithm, and the choice of adaptation gain are derived. The applicability
and limitation of the derived algorithm are also discussed. Experiments
were conducted for separation of three speech sources with the acoustic
paths measured from an office environment, and the algorithm was shown
to improve the average target-to-interference ratio for the three sources
by approximately 15 dB.
Authors:
David V Anderson,
Mark A Clements,
Page (NA) Paper number 2052
Abstract:
The sinusoidal transform (ST) provides a sparse representation for
speech signals by utilizing several psychoacoustic phenomena. It is
well suited to applications in signal enhancement because the signal
is represented in a parametric manner that is easy to manipulate. The
multi--resolution sinusoidal transform (MRST) has the additional advantage
that it is both particularly well suited to typical speech signals
and well matched to the human auditory system. The currently reported
work discusses the removal of noise from a noisy signal by applying
an adaptive Wiener filter to the MRST parameters and then conditioning
the parameters to eliminate ``musical noise.'' In informal tests MRST
based noise reduction was found to reduce background noise significantly
better than traditional Wiener filtering and to virtually eliminate
the ``musical noise'' often associated with Wiener filtering.
Authors:
Chang D Yoo, Korea Telecom (Korea)
Page (NA) Paper number 2435
Abstract:
A novel enhancement system is developed that exploits the properties
of staionary regions localized in both time and frequency. This system
selects stationary time-frequency regions and adaptively enhances each
region according to its local signal-to-noise ratio while utilizing
both the acoustical knowledge of speech and the masking properties
of the human auditory system. Each regon is enhanced for maximum noise
reduction while minimizing distortion. This paper evaluates the proposed
sytem through informal listening tests and some objective measures.
|