9:30, SPEECH-L6.1
SPEECH ENHANCEMENT USING THE SPARSE CODE SHRINKAGE TECHNIQUE
I. POTAMITIS, N. FAKOTAKIS, G. KOKKINAKIS
Our work introduces the sparse code shrinkage (SCS) technique as a speech enhancement algorithm that aims at improving the quality of speech perception. SCS is a fairly new statistical technique originally presented to the applied mathematics and image denoising community, but, to our knowledge, its potential for speech enhancement has not yet been exploited. Its application on speech denoising gives rise to a conceptual framework which is quite different from the techniques dominating speech enhancement domain. SCS originates in applying Independent Component Analysis (ICA) to a large ensemble of clean speech frames, revealing their underlying basis of statistically independent functions. Projecting the frames composing a noisy speech signal on this basis, facilitates the application of Bayesian denoising to each of the resulting independent components individually. The maximum a-posteriori (MAP) formulation leads to a soft threshold function optimally adapted to the statistics of each independent component which effectively reduces white and coloured Gaussian noise. Subsequently, an inverse transformation from the ICA-transformed domain back to the time domain reconstructs the enhanced signal.
9:50, SPEECH-L6.2
STFT-BASED MULTI-CHANNEL ACOUSTIC INTERFERENCE SUPPRESSOR
C. AVENDANO, G. GARCIA
In this paper we describe a system that suppresses the acoustic interference due to the coupling between the microphone and the loudspeakers of a hands-free multi-channel desktop audio system. The proposed system operates in the Short-Time Fourier Transform domain and uses spectral subtraction to suppress the unwanted interference, which consists of the local audio and the remote speech signal (echo). The interference estimate is obtained with a sub-band RLS-based adaptive multi-channel echo canceller. Test results show that under some adverse conditions and with low complexity constraints the system can achieve better and more consistent speech quality than a time-domain acoustic echo canceller.
10:10, SPEECH-L6.3 10:30, SPEECH-L6.4 10:50, SPEECH-L6.5 11:10, SPEECH-L6.6
ESTIMATION OF SPEECH EMBEDDED IN A REVERBERANT ENVIRONMENT WITH MULTIPLE SOURCES OF NOISE
A. BARROS, F. ITAKURA, T. RUTKOWSKI, A. MANSOUR, N. OHNISHI
In this work we propose a system for enhancement of the speech signal with highest energy from a linear convolutive mixture of n statistically independent sound sources recorded by m microphones, where m
EXPERIMENTAL INVESTIGATION OF DELAYED INSTANTANEOUS DEMIXER FOR SPEECH ENHANCEMENT
Y. XIANG, Y. HUA, S. AN, A. ACERO
This paper presents a delayed instantaneous demixer (DID) for speech signal separation from real recordings. Based on the fact that the original signals are colored and mutually uncorrelated, a simple algorithm is derived to estimate the parameters of the demixer. This algorithm consists of two parts: a grid searching method to estimate time delays and an alternating projection method to estimate gain coefficients. Experimental result demonstrates the performance of the model and the algorithm.
LATTICE-LADDER DECORRELATION FILTERS DEVELOPED FOR CO-CHANNEL SPEECH SEPARATION
K. YEN, Y. ZHAO
The previously proposed lattice-ladder adaptive decorrelation
filtering (ADF) algorithm is further studied and improved, with the
aim of developing a more efficient co-channel speech separation
system. The effect of the joint linear predictions is analyzed and
the conversions between the lattice coefficients and the prediction
and filter vectors are formulated. The implementation issues on the
estimation of lattice coefficients are discussed and the adaptation
equations are further refined. Experimental results demonstrate the
effectiveness of the algorithm in reducing cross-interference between
co-channel speech sources as well as the significant performance
improvement over the previous direct-form ADF algorithm. A simplified
lattice-ladder ADF is also proposed as a compromise between
computational cost and system performance.
SINGLE CHANNEL SPEECH ENHANCEMENT USING MDL-BASED SUBSPACE APPROACH IN BARK DOMAIN
R. VETTER
We present in this paper a novel algorithm for single channel
speech enhancement. It is based on a subspace approach in the Bark
domain and an optimal subspace selection by the minimum
description length (MDL) criterion. The processing in the Bark
domain allows us to take into account in an optimal manner the
masking properties of the human auditory system. The subspace
selection provided by the MDL criterion overcomes the limitations
encountered with other selection criteria, like the
overestimation of the signal-plus-noise subspace or the need
for empirical parameters. Together, the resulting MDL-subspace
approach in the Bark domain provides maximum noise reduction
while minimizing signal distortions. The performance of our
algorithm is assessed in white and colored noise. It shows that
our algorithm provides high performance for a large scale of
input signal-to-noise ratio.