Session: SPEECH-L6
Time: 9:30 - 11:30, Thursday, May 10, 2001
Location: Room 151
Title: Speech Enhancement 1
Chair: Yunxin Zhao

9:30, SPEECH-L6.1
SPEECH ENHANCEMENT USING THE SPARSE CODE SHRINKAGE TECHNIQUE
I. POTAMITIS, N. FAKOTAKIS, G. KOKKINAKIS
Our work introduces the sparse code shrinkage (SCS) technique as a speech enhancement algorithm that aims at improving the quality of speech perception. SCS is a fairly new statistical technique originally presented to the applied mathematics and image denoising community, but, to our knowledge, its potential for speech enhancement has not yet been exploited. Its application on speech denoising gives rise to a conceptual framework which is quite different from the techniques dominating speech enhancement domain. SCS originates in applying Independent Component Analysis (ICA) to a large ensemble of clean speech frames, revealing their underlying basis of statistically independent functions. Projecting the frames composing a noisy speech signal on this basis, facilitates the application of Bayesian denoising to each of the resulting independent components individually. The maximum a-posteriori (MAP) formulation leads to a soft threshold function optimally adapted to the statistics of each independent component which effectively reduces white and coloured Gaussian noise. Subsequently, an inverse transformation from the ICA-transformed domain back to the time domain reconstructs the enhanced signal.

9:50, SPEECH-L6.2
STFT-BASED MULTI-CHANNEL ACOUSTIC INTERFERENCE SUPPRESSOR
C. AVENDANO, G. GARCIA
In this paper we describe a system that suppresses the acoustic interference due to the coupling between the microphone and the loudspeakers of a hands-free multi-channel desktop audio system. The proposed system operates in the Short-Time Fourier Transform domain and uses spectral subtraction to suppress the unwanted interference, which consists of the local audio and the remote speech signal (echo). The interference estimate is obtained with a sub-band RLS-based adaptive multi-channel echo canceller. Test results show that under some adverse conditions and with low complexity constraints the system can achieve better and more consistent speech quality than a time-domain acoustic echo canceller.

10:10, SPEECH-L6.3
ESTIMATION OF SPEECH EMBEDDED IN A REVERBERANT ENVIRONMENT WITH MULTIPLE SOURCES OF NOISE
A. BARROS, F. ITAKURA, T. RUTKOWSKI, A. MANSOUR, N. OHNISHI
In this work we propose a system for enhancement of the speech signal with highest energy from a linear convolutive mixture of n statistically independent sound sources recorded by m microphones, where m

10:30, SPEECH-L6.4
EXPERIMENTAL INVESTIGATION OF DELAYED INSTANTANEOUS DEMIXER FOR SPEECH ENHANCEMENT
Y. XIANG, Y. HUA, S. AN, A. ACERO
This paper presents a delayed instantaneous demixer (DID) for speech signal separation from real recordings. Based on the fact that the original signals are colored and mutually uncorrelated, a simple algorithm is derived to estimate the parameters of the demixer. This algorithm consists of two parts: a grid searching method to estimate time delays and an alternating projection method to estimate gain coefficients. Experimental result demonstrates the performance of the model and the algorithm.

10:50, SPEECH-L6.5
LATTICE-LADDER DECORRELATION FILTERS DEVELOPED FOR CO-CHANNEL SPEECH SEPARATION
K. YEN, Y. ZHAO
The previously proposed lattice-ladder adaptive decorrelation filtering (ADF) algorithm is further studied and improved, with the aim of developing a more efficient co-channel speech separation system. The effect of the joint linear predictions is analyzed and the conversions between the lattice coefficients and the prediction and filter vectors are formulated. The implementation issues on the estimation of lattice coefficients are discussed and the adaptation equations are further refined. Experimental results demonstrate the effectiveness of the algorithm in reducing cross-interference between co-channel speech sources as well as the significant performance improvement over the previous direct-form ADF algorithm. A simplified lattice-ladder ADF is also proposed as a compromise between computational cost and system performance.

11:10, SPEECH-L6.6
SINGLE CHANNEL SPEECH ENHANCEMENT USING MDL-BASED SUBSPACE APPROACH IN BARK DOMAIN
R. VETTER
We present in this paper a novel algorithm for single channel speech enhancement. It is based on a subspace approach in the Bark domain and an optimal subspace selection by the minimum description length (MDL) criterion. The processing in the Bark domain allows us to take into account in an optimal manner the masking properties of the human auditory system. The subspace selection provided by the MDL criterion overcomes the limitations encountered with other selection criteria, like the overestimation of the signal-plus-noise subspace or the need for empirical parameters. Together, the resulting MDL-subspace approach in the Bark domain provides maximum noise reduction while minimizing signal distortions. The performance of our algorithm is assessed in white and colored noise. It shows that our algorithm provides high performance for a large scale of input signal-to-noise ratio.