3:30, SPEECH-P10.1
CODEBOOK CONSTRAINED ITERATIVE NOISE CANCELLATION WITH APPLICATIONS TO SPEECH ENHANCEMENT
Y. GAO, J. LU, K. YU, B. XU
The performance of widely-used adaptive noise canceling(ANC) deteriorates much when the desired signal is leaked into the reference channel or when there are uncorrelated noises present in the reference channel. This paper proposes a dual-microphone scheme, named Iterative Noise Canceling (INC), to overcome the drawbacks mentioned above. The proposed INC system, in which a codebook-based speech quality measure is employed to control a modified iterative Wiener filter (MIWF), can automatically reduce noises in the primary input until convergence occurs. In comparison with traditional ANC algorithm, the evaluation using real noises and voices recorded in a car shows the noise reduction performance is dramatically improved, even in cases that the reference SNR is close to 0 dB.
3:30, SPEECH-P10.2
USE OF LOCAL KURTOSIS MEASURE FOR SPOTTING USABLE SPEECH SEGMENTS IN CO-CHANNEL SPEECH
K. KRISHNAMACHARI , R. YANTORNO, J. LOVEKIN, D. BENINCASA , S. WENNDT
Recently, a novel method to process co-channel speech was proposed [1]. Previous methods include enhancing the target speech, or suppressing the interfering speech or both enhancing the target and suppressing the interferer. The proposed new method searches for usable speech frames which are usually found in clusters under co-channel conditions. The term "usability" is context dependent, i.e., usable in the context of such things as speaker identification, gisting, etc. In this paper we investigate the use of kurtosis for spotting usable speech segments under co-channel conditions. Preliminary results reveal that a kurtosis of 1.5 or greater occurs close to the beginning and ends of segments of usable speech, i.e., they usually bracket the usable speech segment. For Male/Male case, we observe that 92% of usable clusters are spotted, for Male/Female case 83% of usable clusters are spotted and for Female/Female case, 86% of usable clusters are spotted
3:30, SPEECH-P10.3
RECURSIVELY UPDATED EIGENFILTERBANK FOR SPEECH ENHANCEMENT
M. JEPPESEN, C. RØDBRO, S. JENSEN
In this paper a novel signal subspace method for speech enhancement
is proposed. The algorithm is derived from the filterbank
interpretation of the truncated (quotient) singular value
decomposition (T(Q)SVD) algorithm. We derive a recursive version of
this algorithm which results in a recursively updated
eigenfilterbank. The proposed method benefits from a low system
delay and a low amount of musical noise in the enhanced speech
signal.
3:30, SPEECH-P10.4
STATISTICAL SPEECH RECONSTRUCTION AT THE PHONEME LEVEL
M. SAVIC, M. MOORE, C. SCOVILLE
Statistical methods for reconstructing speech at the phoneme level are used to find missing phonemes that are removed from sentences in the TIMIT corpus. Probabilities for the occurrence of the missing phoneme(s) are generated and the most likely candidate(s) selected to reconstruct the sentence. Method includes symmetric and asymmetric ‘confidence windowing’ around the missing phoneme(s) for determination of the most likely candidates. Reconstruction rates for one or more phonemes missing in a sequence can exceed 85%.
3:30, SPEECH-P10.5
ON SPEECH ENHANCEMENT UNDER SIGNAL PRESENCE UNCERTAINTY
I. COHEN
In this paper, we present an optimally-modified Log-Spectral
Amplitude estimator, which minimizes the mean-square error of the
log-spectra for speech signals under signal presence uncertainty.
The spectral gain function is obtained as a weighted geometric mean
of the hypothetical gains associated with signal presence and
absence. The exponential weight of each hypothetical gain is its
corresponding probability, conditioned on the observed signal. We
introduce an efficient estimation approach for the a priori signal
absence probability in each frequency bin, which exploits the
strong correlation of speech presence in neighboring frequency bins
of consecutive frames. Objective and subjective evaluation confirm
superiority in noise suppression and quality of the enhanced
speech.
3:30, SPEECH-P10.6
SPEECH ENHANCEMENT VIA FREQUENCY BANDWIDTH EXTENSION USING LINE SPECTRAL FREQUENCIES
S. CHENNOUKH, A. GERRITS, G. MIET, R. SLUIJTER
This paper contributes to narrowband speech enhancement by means of frequency bandwidth extension. A new algorithm is proposed for generating synthetic frequency components in the highband (i.e., 4-8 kHz) given the lowband ones (i.e., 0-4 kHz)for wideband speech synthesis. It is based on linear prediction (LPC) analysis-synthesis. It consists of a spectral envelope extension using efficiently line spectral frequencies (LSF) and a bandwidth extension of the LPC analysis residual using a spectral folding. The lowband LSF of the synthesis signal are obtained from the input speech signal and the highband LSF are estimated from the lowband ones using statistical models. This estimation is achieved by means of four models that are distinguished by means of the first two reflection coefficients obtained from the input signal linear prediction analysis.
3:30, SPEECH-P10.7
A CROSS-CORRELATION TECHNIQUE FOR ENHANCING SPEECH CORRUPTED WITH CORRELATED NOISE
M. BHATNAGAR, Y. HU, P. LOIZOU
Most speech enhancement techniques do not perform very well in correlated or colored noise, as they assume that noise and speech are not correlated. In this paper, we propose a method, based on spectral subtraction, which takes into account possible correlation between noise and speech. Objective measures showed that the proposed method outperformed the power spectral subtraction method resulting in better speech quality and reduced levels of musical noise. Further enhancements in speech quality were obtained by applying a perceptual weighting function (estimated using a psychoacoustics model) that was designed to minimize noise distortion.
3:30, SPEECH-P10.8
ESTIMATION OF THE EXCITATION VARIANCES OF SPEECH AND NOISE AR-MODELS FOR ENHANCED SPEECH CODING
M. KUROPATWINSKI, W. KLEIJN
In this paper, we consider the estimation of short-term predictor
(STP) parameters under noisy conditions. The possible autoregressive
spectral shapes of the speech and additive noise are stored in
AR-coefficient codebooks. The product codebook is then searched to
maximize the likelihood function of the observed noisy speech signal
frame. The Maximum Likelihood (ML) estimates of the variances of the
driving term are computed for each pair of the speech and noise AR
spectra. For further processing (e.g., Kalman filtering or speech
coding using enhanced STP parameters), the spectra and variances that
yield the maximum of the likelihood function are selected. To evaluate
the proposed method, the estimates of the spectral shapes and
variances are compared with those computed from clean speech signal
using a common spectral distortion measure. Globally maximizing the
likelihood function over some restricted region of the parameter
space, the presented approach provides robust estimates.