Authors IndexSessionsTechnical programAttendees

 

Session: Other Topics in ASR Robustness, Adaptation and Language Modeling

Title: Acoustic Analysis and Recognition of Whispered Speech

Authors: Taisuke Itoh, Kazuya Takeda, Fumitada Itakura

Abstract: In this paper, acoustic properties and the recognition method of whispered speech are discussed.
A whispered speech database that consists of whispered speech, normal speech and their corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared.
The comparison between whispered and normal utterances show that 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes, respectively, 2) the spectral tilt of the whispered speech is less sloped than the normal speech and 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of the normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR adaptation is applied, while the normal speech HMM adapted with the whispered speech attain only 56% syllable accuracy.

a01ti092.ps a01ti092.pdf