Title:
Markovian Combination of Language and Prosodic Models for better Speech Understanding and Recognition
Abstract:
Traditionally, "language" models capture only the word sequences of a language.
A crucial component of spoken language, however, are its rhythmic and
melodic properties, i.e., its prosody.
This talk will summarize recent work on integrated, computationally efficient
modeling of word sequences and prosodic properties of speech, for a variety of
speech recognition and understanding tasks, such as dialog act tagging,
disfluency detection, and segmentation into sentences and topics.
In each case it turns out that hidden Markov representations of the
underlying structures and associated observations arise naturally, and allow
existing speech recognizers to be combined with separately trained prosodic
classifiers. The same HMM-based models can be used in two modes:
to recover hidden structure (such as sentence boundaries), or to evaluate
speech recognition hypotheses, thereby integrating prosody into the
recognition process.
More information about individual research projects, as well as the
publications listed, are available at
http://www.speech.sri.com/projects/hiddenevents.html, http://www.speech.sri.com/projects/sieve/, and http://www.clsp.jhu.edu/ws97/discourse/.
Curriculum:
Andreas Stolcke received his undergraduate degree in Computer Science
from the Technische Universitaet Munich in 1988, and a Ph.D. in
Computer Science from the University of California at Berkeley in
1994. He was a research assistant and postdoctoral researcher at the
International Computer Science Institute (ICSI) in Berkeley, doing
research on connectionist and probabilistic methods for natural
language processing. His doctoral thesis investigated learning and
parsing algorithms for probabilistic grammars. Andreas is currently a
Senior Research Engineer with the Speech Technology and Research
Laboratory at SRI International, as well as a Visiting Researcher at
ICSI. His recent work has been on statistical models for speech
recognition and understanding, focusing on language and prosodic
models for spontaneous and conversational speech.
|