Title: Multispeaker Speech Activity Detection for the ICSI Meeting Recorder
Authors: Thilo Pfau, Daniel P.W. Ellis, Andreas Stolcke
Abstract:
As part of a project into speech recognition in meeting environments, we have collected a corpus of multi-channel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM).
A baseline HMM speech activity detector has been extended to use mixtures of Gaussians. The use of feature normalization and crosscorrelation processing results in a 35% relative reduction of the frame error rate.
Speech recognition experiments show that using the output of the speech activity detector for presegmenting the recognizer input leads to word error rates within 10% of those achieved with manual turn labeling.
|