3:30, SPEECH-L11.1
MICROPHONE ARRAY SUB-BAND SPEECH RECOGNITION
I. MCCOWAN, S. SRIDHARAN
This paper proposes the integration of sub-band speech recognition with a microphone array. A broadband beamforming microphone array allows for natural integration with sub-band speech recognition as the beamformer is typically implemented as a combination of band-limited sub-arrays. Rather than recombining the sub-array outputs to give a single enhanced output, we propose the fusion of separate hidden Markov models trained on each sub-array frequency band. In addition, a dynamic sub-band weighting scheme is proposed in which the cross- and auto-spectral densities of the microphone array inputs are used to estimate the reliability of each frequency band. The microphone array sub-band system is evaluated on an isolated digit recognition task and compared to the standard full-band approach. The results of the proposed dynamic weighting scheme are compared to those obtained using both fixed equal sub-band weights, as well as optimal sub-band weights calculated from a priori knowledge of the correct results.
3:50, SPEECH-L11.2
SPEECH ENHANCEMENT BY MULTIPLE BEAMFORMING WITH REFLECTION SIGNAL EQUALIZATION
T. NISHIURA, S. NAKAMURA, K. SHIKANO
In real environments, the presence of room reverberations seriously
degrades the quality in sound capture. To solve this problem, multiple
beamforming, which forms directivity not only in
the direction of the desired sound source but also in the direction of
reflection images, was proposed by J. Flanagan et al. However, it is
difficult to apply this method practically in real environments, since
this application requires that the distortion of reflection sound
signals by wall impedances be equalized. This paper proposes a new
multiple beamforming algorithm that equalizes the amplitude-spectrum
and phase-spectrum of reflection signals by a cross-spectrum
method. Evaluation experiments are conducted in
real environment. In a SDR (Signal to Distortion Ratio) evaluation, the proposed multiple beamformer achieves signal distortion reduction more effectively than the conventional single beamformer and the
conventional multiple beamformer without equalization. In addition,
in an ASR (Automatic Speech Recognition) evaluation, the equalized
multiple beamformer achieves a higher recognition performance than
those of the above conventional beamformers.
4:10, SPEECH-L11.3
A MICROPHONE ARRAY-BASED 3-D N-BEST SEARCH ALGORITHM FOR THE SIMULTANEOUS RECOGNITION OF MULTIPLE SOUND SOURCES IN REAL ENVIRONMENTS
P. HERACLEOUS, S. NAKAMURA, K. SHIKANO
This paper deals with the recognition of distant talking speech and, particularly, with the simultaneous recognition of multiple sound sources. A problem that must be solved in the recognition of distant talking speech is talker localization. In some approaches, the talker is localized by using short- and long-term power. The 3-D Viterbi search based method proposed by Yamada et al., integrates talker localization and speech recognition. This method provides high recognition rates but its application is restricted to the presence of one talker. In order to deal with multiple talkers, we extended the 3-D Viterbi search method to a 3-D N-best search method enabling the recognition of multiple sound sources. This paper describes our baseline 3-D N-best search-based system and two additional techniques, namely, a likelihood normalization technique and a path distance-based clustering technique. The paper also describes experiments carried out in order to evaluate the performance of the system.
4:30, SPEECH-L11.4
MULTICHANNEL FILTERING FOR OPTIMUM NOISE REDUCTION IN MICROPHONE ARRAYS
D. FLORENCIO, H. MALVAR
This paper introduces a new optimization criterion for the design of microphone arrays, and derives an optimum filter based on this criterion. The algorithm computes two separate correlation matrices for the signal: one for when only background noise is present, and one for when both noise and signal are present. A filter is then computed based on these matrices, optimizing the proposed weighted mean-square error criterion. A block-recursive version of the algorithm is presented, using LMS-like adaptation of the multichannel filters, with a computational complexity under 40 MIPS for a typical application with four microphones. Simulation results with typical office noise show improvements of up to 20 dB in signal-to-noise ratio, even in low-noise environments.
4:50, SPEECH-L11.5
MICROPHONE ARRAY SPEECH DEREVERBERATION USING COARSE CHANNEL MODELING
S. GRIEBEL, M. BRANDSTEIN
This paper presents a model-based method for the enhancement of
multi-channel speech acquired under reverberant conditions. A very
coarse estimate of the channel responses associated with each
source-microphone pair is derived directly from the received data
on a short-term basis. These estimates are employed to modify the
LPC residuals of the channel data in an effort to deemphasize the
effects of reverberant energy in the resulting synthesized signal.
The approach is robust to conditions of partial and approximate
channel information. Specifically, the incorporated channel model
requires only approximate times and amplitudes of the initial
multi-path reflections. In practice these impulses are
responsible for the bulk of reverberant energy in the received
speech signal and can be estimated to a sufficient degree on a
time-varying basis.
5:10, SPEECH-L11.6
A MULTI-MICROPHONE SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT
F. JABLOUN, B. CHAMPAGNE
In this paper, we extend the single microphone signal subspace
approach for speech enhancement, to a multi-microphone design. In
the single microphone case, the trade-off between speech quality
and intelligibility is an handicap which limits its performance.
This is because it is based on a linear speech model which does
not usually offer enough degrees of freedom for noise reduction.
In our method, we show how we can easily, and with comparable
computational complexity, get more degrees of freedom by using
signals from more than one microphone. Experimental results show
that this leads to improvements in the noise reduction
performance.