Authors IndexSessionsTechnical programAttendees

 

Session: Audio-video Information Retrieval and Digital Archives - Multilingual and Speech-to-Speech
Translation

Title: SPEECH DATA RETRIEVAL SYSTEM CONSTRUCTED ON A UNIVERSAL PHONETIC CODE DOMAIN

Authors: Kazuyo Tanaka, Yoshiaki Itoh, Hiroaki Kojima, Nahoko Fujimura

Abstract: We propose a novel speech processing framework, where all of speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce an IPA-based sub-phonetic segment (SPS) set to deal with multilingual speech and develop a procedure to estimate acoustic models of the SPS from IPA-like phone models. The key point of the framework is to employ environment adaptation into the SPS encoding stage. This makes it possible to normalize acoustic variations and extract the language factor contained in speech signals as encoded SPS sequences. We confirm these characteristics by constructing speech retrieval system on the SPS domain. The system can retrieve key phrases, given by speech, from different environment speech data. We show several preliminary experimental results on this system, using Japanese and English sentence speech sets.

a01kt080.ps a01kt080.pdf