ASRU2001


Authors IndexSessionsTechnical programAttendees


	Session: Other Topics in ASR Robustness, Adaptation and Language Modeling

Title: STATE SYNCHRONOUS MODELING OF AUDIO-VISUAL INFORMATION FOR BI-MODAL SPEECH RECOGNITION

Authors: Satoshi Nakamura, Kenichi Kumatani, Satoshi Tamura

Abstract: There have been higher demands recently for Automatic Speech Recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to effectively integrate audio and visual information in audio-visual (bi-modal) ASR systems. Such integration inevitably necessitates modeling of the synchronization of the audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on HMM composition. The proposed model can represent state synchronicity not only within a phoneme but also between phonemes. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech.

a01sn087.ps a01sn087.pdf