Session: SPEECH-P10
Time: 3:30 - 5:30, Thursday, May 10, 2001
Location: Exhibit Hall Area 7
Title: Speech Enhancement 2
Chair: Bryan George

3:30, SPEECH-P10.1
CODEBOOK CONSTRAINED ITERATIVE NOISE CANCELLATION WITH APPLICATIONS TO SPEECH ENHANCEMENT
Y. GAO, J. LU, K. YU, B. XU
The performance of widely-used adaptive noise canceling(ANC) deteriorates much when the desired signal is leaked into the reference channel or when there are uncorrelated noises present in the reference channel. This paper proposes a dual-microphone scheme, named Iterative Noise Canceling (INC), to overcome the drawbacks mentioned above. The proposed INC system, in which a codebook-based speech quality measure is employed to control a modified iterative Wiener filter (MIWF), can automatically reduce noises in the primary input until convergence occurs. In comparison with traditional ANC algorithm, the evaluation using real noises and voices recorded in a car shows the noise reduction performance is dramatically improved, even in cases that the reference SNR is close to 0 dB.

3:30, SPEECH-P10.2
USE OF LOCAL KURTOSIS MEASURE FOR SPOTTING USABLE SPEECH SEGMENTS IN CO-CHANNEL SPEECH
K. KRISHNAMACHARI , R. YANTORNO, J. LOVEKIN, D. BENINCASA , S. WENNDT
Recently, a novel method to process co-channel speech was proposed [1]. Previous methods include enhancing the target speech, or suppressing the interfering speech or both enhancing the target and suppressing the interferer. The proposed new method searches for usable speech frames which are usually found in clusters under co-channel conditions. The term "usability" is context dependent, i.e., usable in the context of such things as speaker identification, gisting, etc. In this paper we investigate the use of kurtosis for spotting usable speech segments under co-channel conditions. Preliminary results reveal that a kurtosis of 1.5 or greater occurs close to the beginning and ends of segments of usable speech, i.e., they usually bracket the usable speech segment. For Male/Male case, we observe that 92% of usable clusters are spotted, for Male/Female case 83% of usable clusters are spotted and for Female/Female case, 86% of usable clusters are spotted

3:30, SPEECH-P10.3
RECURSIVELY UPDATED EIGENFILTERBANK FOR SPEECH ENHANCEMENT
M. JEPPESEN, C. RØDBRO, S. JENSEN
In this paper a novel signal subspace method for speech enhancement is proposed. The algorithm is derived from the filterbank interpretation of the truncated (quotient) singular value decomposition (T(Q)SVD) algorithm. We derive a recursive version of this algorithm which results in a recursively updated eigenfilterbank. The proposed method benefits from a low system delay and a low amount of musical noise in the enhanced speech signal.

3:30, SPEECH-P10.4
STATISTICAL SPEECH RECONSTRUCTION AT THE PHONEME LEVEL
M. SAVIC, M. MOORE, C. SCOVILLE
Statistical methods for reconstructing speech at the phoneme level are used to find missing phonemes that are removed from sentences in the TIMIT corpus. Probabilities for the occurrence of the missing phoneme(s) are generated and the most likely candidate(s) selected to reconstruct the sentence. Method includes symmetric and asymmetric ‘confidence windowing’ around the missing phoneme(s) for determination of the most likely candidates. Reconstruction rates for one or more phonemes missing in a sequence can exceed 85%.

3:30, SPEECH-P10.5
ON SPEECH ENHANCEMENT UNDER SIGNAL PRESENCE UNCERTAINTY
I. COHEN
In this paper, we present an optimally-modified Log-Spectral Amplitude estimator, which minimizes the mean-square error of the log-spectra for speech signals under signal presence uncertainty. The spectral gain function is obtained as a weighted geometric mean of the hypothetical gains associated with signal presence and absence. The exponential weight of each hypothetical gain is its corresponding probability, conditioned on the observed signal. We introduce an efficient estimation approach for the a priori signal absence probability in each frequency bin, which exploits the strong correlation of speech presence in neighboring frequency bins of consecutive frames. Objective and subjective evaluation confirm superiority in noise suppression and quality of the enhanced speech.

3:30, SPEECH-P10.6
SPEECH ENHANCEMENT VIA FREQUENCY BANDWIDTH EXTENSION USING LINE SPECTRAL FREQUENCIES
S. CHENNOUKH, A. GERRITS, G. MIET, R. SLUIJTER
This paper contributes to narrowband speech enhancement by means of frequency bandwidth extension. A new algorithm is proposed for generating synthetic frequency components in the highband (i.e., 4-8 kHz) given the lowband ones (i.e., 0-4 kHz)for wideband speech synthesis. It is based on linear prediction (LPC) analysis-synthesis. It consists of a spectral envelope extension using efficiently line spectral frequencies (LSF) and a bandwidth extension of the LPC analysis residual using a spectral folding. The lowband LSF of the synthesis signal are obtained from the input speech signal and the highband LSF are estimated from the lowband ones using statistical models. This estimation is achieved by means of four models that are distinguished by means of the first two reflection coefficients obtained from the input signal linear prediction analysis.

3:30, SPEECH-P10.7
A CROSS-CORRELATION TECHNIQUE FOR ENHANCING SPEECH CORRUPTED WITH CORRELATED NOISE
M. BHATNAGAR, Y. HU, P. LOIZOU
Most speech enhancement techniques do not perform very well in correlated or colored noise, as they assume that noise and speech are not correlated. In this paper, we propose a method, based on spectral subtraction, which takes into account possible correlation between noise and speech. Objective measures showed that the proposed method outperformed the power spectral subtraction method resulting in better speech quality and reduced levels of musical noise. Further enhancements in speech quality were obtained by applying a perceptual weighting function (estimated using a psychoacoustics model) that was designed to minimize noise distortion.

3:30, SPEECH-P10.8
ESTIMATION OF THE EXCITATION VARIANCES OF SPEECH AND NOISE AR-MODELS FOR ENHANCED SPEECH CODING
M. KUROPATWINSKI, W. KLEIJN
In this paper, we consider the estimation of short-term predictor (STP) parameters under noisy conditions. The possible autoregressive spectral shapes of the speech and additive noise are stored in AR-coefficient codebooks. The product codebook is then searched to maximize the likelihood function of the observed noisy speech signal frame. The Maximum Likelihood (ML) estimates of the variances of the driving term are computed for each pair of the speech and noise AR spectra. For further processing (e.g., Kalman filtering or speech coding using enhanced STP parameters), the spectra and variances that yield the maximum of the likelihood function are selected. To evaluate the proposed method, the estimates of the spectral shapes and variances are compared with those computed from clean speech signal using a common spectral distortion measure. Globally maximizing the likelihood function over some restricted region of the parameter space, the presented approach provides robust estimates.