Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-25 |
|
SP-25.1
|
SUBSPACE STATE SPACE MODEL IDENTIFICATION FOR SPEECH ENHANCEMENT
Eric J Grivel,
Marcel G Gabrea,
Mohamed Najim (Equipe Signal et Image, B.P. 99, F-33 402 Talence Cedex, France.)
This paper deals with Kalman filter-based enhancement of a speech signal contaminated by a white noise, using a single microphone system. Such a problem can be stated as a realization issue in the framework of identification. For such a purpose we propose to identify the state space model by using subspace non-iterative algorithms based on orthogonal projections. Unlike Estimate-Maximize (EM)-based algorithms, this approach provides, in a single iteration from noisy observations, the matrices related to state space model and the covariance matrices that are necessary to perform Kalman filtering. In addition no voice activity detector is required unlike existing methods. Both methods proposed here are compared with classical approaches.
|
SP-25.2
|
Using AR HMM State-Dependent Filtering for Speech Enhancement
Driss Matrouf (LIMSI-CNRS (France)),
Jean-Luc Gauvain (LIMSI-CNRS (france))
In this paper we address the problem of enhancing speech which has
been degraded by additive noise. As proposed by Ephraim et~al.,
autoregressive hidden Markov models (AR-HMM) for the clean speech and
an autoregressive Gaussian for the noise are used. The filter applied
to a given frame of noisy speech is estimated using the noise model
and the autoregressive Gaussian having the highest a posteriori
probability given the decoded state sequence. The success of this
technique is highly dependent on accurate estimation of
the best state sequence. A new strategy combining the use of
cepstral-based HMMs, autoregressive HMMs, and a model combination
technique, is proposed. The intelligibility of the enhanced speech is
indirectly assessed via speech recognition, by comparing performance
on noisy speech with compensated models to performance on the enhanced
speech with clean-speech models. The results on enhanced speech are as
good as our best results obtained with noise compensated models.
|
SP-25.3
|
Tracking Speech-Presence Uncertainty to Improve Speech Enhancement In Non-Stationary Noise Environments
David Malah,
Richard V Cox,
Anthony J Accardi (AT&T Labs - Research, Florham Park, NJ 07932)
Speech enhancement algorithms which are based on estimating the
short-time spectral amplitude of the clean speech have better
performance when a soft-decision gain modification, depending on
the a priori probability of speech absence, is used. In reported
works a fixed probability, q, is assumed. Since speech is
non-stationary and may not be present in every frequency bin when
voiced, we propose a method for estimating distinct values of q
for different bins which are tracked in time. The estimation is
based on a decision-theoretic approach for setting a threshold
in each bin followed by short-time averaging. The estimated q's
are used to control both the gain and the update of the estimated
noise spectrum during speech presence in a modified MMSE
log-spectral amplitude estimator. Subjective tests resulted in
higher scores than for the IS-127 standard enhancement algorithm,
when pre-processing noisy speech for a coding application.
|
SP-25.4
|
Adaptive Two-Band Spectral Subtraction with Multi-window Spectral Estimation
Chuang He (Los Alamos National Laboratory, T Division, MS B276, Los Alamos, NM 87545),
George Zweig (Signition, Inc., 901 18th Street, Los Alamos, NM 87544)
An improved spectral subtraction algorithm for enhancing speech corrupted by
additive wideband noise is described. The artifactual noise introduced by
spectral subtraction that is perceived as musical noise is 7 dB less than that
introduced by the classical spectral subtraction algorithm of Berouti et
al. Speech is decomposed into voiced and unvoiced sections. Since voiced speech
is primarily stochastic at high frequencies, the voiced speech is high-pass
filtered to extract its stochastic component. The cut-off frequency is
estimated adaptively. Multi-window spectral estimation is used to estimate the
spectrum of stochastically voiced and unvoiced speech, thereby reducing the
spectral variance. A low-pass filter is used to extract the deterministic
component of voiced speech. Its spectrum is estimated with a single
window. Spectral subtraction is performed with the classical algorithm using
the estimated spectra. Informal listening tests confirm that the new algorithm
creates significantly less musical noise than the classical algorithm.
|
SP-25.5
|
Speech Enhancement Using Voice Source Models
Anisa Yasmin (Department of Electrical and Computer Engineering, University of Waterloo),
Paul W Fieguth (Dept. of Systems Design - University of Waterloo),
Li Deng (Dept. of Electrical and Computer Engineering, University of Waterloo)
Autoregressive (AR) models have been shown to be effective models of
the human vocal tract during voicing. However the most common model
of speech for enhancement purposes, an AR process excited by white noise,
fails to capture the periodic nature of voiced speech.
Speech synthesis researchers have long recognized this problem and
have developed a variety of sophisticated excitation models, however
these models have yet to make an impact in speech enhancement. We have
chosen one of the most common excitation models, the four-parameter
LF model of Fant, Liljencrants and Lin, and applied it to the
enhancement of individual voiced phonemes. Comparing the performance of
the conventional white-noise-driven AR, an impulse-driven AR, and an
AR based on the LF model shows that the LF model yields a substantial
improvement, on the order of 1.3 dB.
|
SP-25.6
|
Adaptive Decorrelation Filtering for Separation of Co-Channel Speech Signals from M > 2 Sources
Kuan-Chieh Yen (University of Illinois at Urbana-Champaign (USA)),
Yunxin Zhao (University of Missouri - Columbia (USA))
The ADF algorithm for separating two signal sources by Weinstein,
Feder, and Oppenheim is generalized for separation of co-channel
speech signals from more than two sources. The system configuration,
its accompanied ADF algorithm, and the choice of adaptation gain are
derived. The applicability and limitation of the derived algorithm
are also discussed. Experiments were conducted for separation of
three speech sources with the acoustic paths measured from an office
environment, and the algorithm was shown to improve the average
target-to-interference ratio for the three sources by approximately
15 dB.
|
SP-25.7
|
Audio Signal Noise Reduction Using Multi-resolution Sinusoidal Modeling
David V Anderson,
Mark A Clements (Georgia Institute of Technology)
The sinusoidal transform (ST)
provides a sparse representation for speech signals by utilizing
several psychoacoustic phenomena. It is well suited to applications
in signal enhancement because the signal is represented in a
parametric manner that is easy to manipulate. The multi--resolution
sinusoidal transform (MRST) has the
additional advantage that it is both particularly well suited to
typical speech signals and well matched to the human auditory
system. The currently reported work discusses the
removal of noise from a noisy
signal by applying an adaptive Wiener filter to the MRST parameters
and then conditioning the parameters to eliminate ``musical noise.''
In informal tests MRST based noise reduction was found to reduce
background noise significantly better than traditional Wiener
filtering and to virtually eliminate the ``musical noise'' often
associated with Wiener filtering.
|
SP-25.8
|
Utilizing Interband Acoustical Information for Modeling Stationary Time-Frequency Regions of Noisy Speech
Chang D Yoo (Korea Telecom)
A novel enhancement system is developed that exploits the properties of staionary regions
localized in both time and frequency. This system selects stationary time-frequency regions
and adaptively enhances each region according to its local signal-to-noise
ratio while utilizing both the acoustical knowledge of speech and the masking
properties of the human auditory system. Each regon is enhanced for maximum
noise reduction while minimizing distortion. This paper evaluates the proposed
sytem through informal listening tests and some objective measures.
|
|