Session: SAM-P7
Time: 9:30 - 11:30, Friday, May 11, 2001
Location: Exhibit Hall Area 4
Title: Signal Processing with Microphone Arrays
Chair: Jacob Gunther

9:30, SAM-P7.1
NONLINEAR FILTERING FOR SPEAKER TRACKING IN NOISY AND REVERBERANT ENVIRONMENTS
J. VERMAAK, A. BLAKE
This paper addresses the problem of speaker tracking in a noisy and reverberant environment using time delay of arrival (TDOA) measurements at spatially distributed microphone pairs. The tracking problem is posed within a state-space estimation framework, and models are developed for the speaker motion and the likelihood of the speaker location in the light of the TDOA measurements. The resulting state-space model is non-linear and non-Gaussian, and consequently no closed-form solutions exist for the filtering distributions required to perform tracking. Here Sequential Monte Carlo (SMC) methods are applied to approximate the true filtering distribution with a set of samples. The resulting tracking algorithm requires no triangulation, is computationally efficient, and can straightforwardly be extended to track multiple speakers.

9:30, SAM-P7.2
A ROBUST SPEECH DETECTION ALGORITHM IN A MICROPHONE ARRAY TELECONFERENCING SYSTEM
Q. ZOU, X. ZOU, M. ZHANG, Z. LIN
This paper describes a robust speech detection algorithm that can operate reliably in a microphone array teleconferencing system. High performance in a non-stationary noisy environment is achieved by combining the following techniques: (1) noise suppression by spectral subtraction, (2) silence detection by adaptive noise threshold and (3) non-stationary noise detection based on the availability of pitch signal. This algorithm can prevent the microphone-array-based speaker tracking system from being misguided by noises commonly present in a conference room. Real world experiments show that this algorithm performs very well and has the potential for practical applications.

9:30, SAM-P7.3
AN EXPERIMENT THAT VALIDATES THEORY WITH MEASUREMENTS FOR A LARGE-APERTURE MICROPHONE ARRAY
H. SILVERMAN, W. PATTERSON III, J. SACHAR
Poor sound pick up by remote microphones in multimedia applications, conference rooms and auditoria has traditionally hampered speech recognition and communication among spatially-separated groups. The problems are reverberation, acoustic noise, and the variability of the radiation pattern of unconstrained talkers. One potential solution that is becoming increasingly practical is to use an array of microphones and sophisticated signal processing. In this paper a brief description of a large, real-time, working system is presented and its measured beamforming performance is compared to what is predicted from a mathematical model. A combination of a synchronized test signal/system and a careful mathematical model results in the performances matching surprisingly well. From this match of theory to practice, we are able to draw some important inferences about future system improvements.

9:30, SAM-P7.4
DIRECTION OF ARRIVAL ESTIMATION BASED ON NONLINEAR MICROPHONE ARRAY
H. KAMIYANAGIDA, H. SARUWATARI, K. TAKEDA, F. ITAKURA
This paper describes a new method for estimating the direction of arrival (DOA) using a nonlinear microphone array based on complementary beamforming. Complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns each other. In this system, since the resultant directivity pattern is proportional to the product of these directivity patterns, the proposed method can be used to estimate DOAs even when the number of sound sources is equal to or exceeds that of microphones. First, DOA-estimation experiments are performed using actual devices in real acoustic environments. The results clarify that DOA estimation for two sound sources can be accomplished by the proposed method with only two microphones. Also, by comparing the resolutions of DOA estimation by the proposed method and by the conventional minimum variance method, we can show that the performance of the proposed method is superior to that of the conventional method.

9:30, SAM-P7.5
BLIND DECONVOLUTION OF REVERBERATED SPEECH SIGNALS VIA REGULARIZATION
J. LIU, H. MALVAR
This paper explores blind deconvolution of reverberated speech signals in microphone array applications. Two regularization approaches are proposed based on available a priori knowledge. The regularized least-squares (LS) approach uses the speech signal characteristics and the lowpass nature of the reverberation channel; and the regularized cross correlation (CR) approach requires more precise knowledge of reverberation which can be obtained through training. The two methods are robust to the presence of noise.

9:30, SAM-P7.6
ESTIMATION OF SOURCE LOCATION BASED ON 2-D MUSIC AND ITS APPLICATION TO SPEECH RECOGNITION IN CARS
T. NAGAI, K. KONDO, M. KANEKO, A. KUREMATSU
This paper proposes a speech recognition and an enhancement system for noisy car environments based on a microphone array. In the system, multiple microphones are arranged in 2-dimensional space, surrounding the interior of a car, and the speaker's location is first estimated by our proposed HE (Harmonic Enhanced) 2-D MUSIC (MUltiple SIgnal Classification). Then, 2-D Delay and Sum (DS) is applied to enhance the target speech. Such pre-processing makes robust speech recognition in noisy car environments possible. In the proposed system, not only a driver, but also a fellow passenger can control car electronics by their voices no matter where they are. This is an advantage of the system as well. The results of the simulation and the preliminary experiment in a real car environment are presented to confirm the validity of our proposed system.

9:30, SAM-P7.7
ESTIMATING POSITONS OF ADJACENT PLURAL SPEAKERS USING CORRELATION OF MUSIC SPECTRUM WITH MICROPHONE ARRAY
H. TANAKA, T. KOBAYASHI
In this paper, we propose an improved method of estimating positions of plural speakers with microphone array. A well-known method, MUSIC, can be utilized for estimating speakers' positions with high precision. However, in the special case that speakers are closely located, the conventional MUSIC-based method sometimes fails to identify the existence of some speakers, because MUSIC spectrum is interfered by that of other persons. To surmount this difficulty, we propose a new method which utilizes cross-correlation of space-spectrum calculated by MUSIC. Experimental results in the real environment have shown that the proposed method is effective enough to estimate the positions of adjacent speakers.

9:30, SAM-P7.8
LARGE VS SMALL APERTURE MICROPHONE ARRAYS: PERFORMANCE OVER A LARGE FOCAL AREA
J. SACHAR, H. SILVERMAN, W. PATTERSON III
Microphone arrays offer the potential of obtaining a high-quality speech signal from a remote talker in a noisy, multi-source environment. In many important applications, sources of interest may be located anywhere within a large focal area and thus it is desirable that an array's performance be uniform over that area. In this paper, simulated and measured results are presented indicating the performances of representative small and large aperture arrays at various points in a large focal area. From these results, it is clear that uniformity over a large focal area is an advantage of large-aperture arrays.

9:30, SAM-P7.9
ACOUSTIC SOURCE DIRECTION BY HEMISPHERE SAMPLING
S. BIRCHFIELD, D. GILLMOR
A method for estimating the direction to a sound source, using a compact array of microphones, is presented. For each pair of microphones, the signals are prefiltered and correlated. Rather than taking the peak of the correlation vectors as estimates for the time delay between the microphones, all the correlation vectors are accumulated in a common coordinate system, namely a unit hemisphere centered on the microphone array. The maximum cell in the hemisphere then indicates the azimuthal and elevation angles to the source. Unlike previous techniques, this algorithm is applicable to arbitrary microphone configurations, handles more than two microphone pairs, and has no blind spots. Experiments demonstrate significantly increased robustness to noise, compared with previous techniques.

9:30, SAM-P7.10
MULTIMODAL LOCALIZATION OF A FLYING BAT
K. GHOSE, D. ZOTKIN, R. DURAISWAMI, C. MOSS
In this paper we present a new multimodal system that combines stereoscopic and audio-based source localization to perform behavioral studies on a flying bat. Also presented are novel algorithms for audio source localization. The bat was allowed to fly in an anechoic flight room and monitored by two high speed video cameras. The vocalizations of the bat were simultaneously recorded using seven microphones. The data was then processed offline to localize the source and reconstruct the trajectory of the bat. We compare the performance of the localization algorithm with the position data obtained from steroscopic pictures of the bat. The results confirm that the stereoscopic analysis and the audio localization are in good agreement. This system opens up new possibilities for performing multimodal research, and developing more tightly integrated algorithms.

9:30, SAM-P7.11
INVESTIGATION OF EFFECTIVENESS OF MICROPHONE ARRAYS FOR IN CAR USE BASED ON SOUND FIELD SIMULATION
V. GALANENKO, A. KALYUZHNY, A. KOVTONYUK
The effectiveness of small microphone arrays for in car use is investigated. These arrays are designed for speech enhancing at noise background within the cabin of a moving car. Speech and noise simulation based on designed mathematical algorithm of sound field modeling within a car cavity is applied for predicting effectiveness of the spatial-time processing algorithms. The mathematical model takes into account complicated cabin geometry, the difference of its sizes in wavelength scale at low and high frequencies, frequency dependent sound absorption of the cabin surfaces and distributed noise sources with their cross correlation. Theoretical estimates of the microphone arrays effectiveness and output simulated signals (for subjective estimation) are presented.