Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session MMSP-3 |
|
MMSP-3.1
|
Robust Speaker Verification via Fusion of Speech and Lip Modalities
Timothy J Wark,
Sridha Sridharan,
Vinod Chandran (Queensland Univerisity of Technology)
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise.
|
MMSP-3.2
|
Unsupervised Lip Segmentation Under Natural Conditions
Marc Lievin,
Franck Luthon (Signal And Image Laboratory, LIS-INPG, France)
An unsupervised algorithm for speaker's lip segmentation is presented in this paper.
A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up.
First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space and sequence dependant parameters are evaluated.
Second, a statistical approach using Markov random field modeling segment mouth shape using red hue predominant region and motion in a spatiotemporal neighborhood.
Simultaneously, a Region Of Interest (ROI) is automatically extracted. Third, the speaker's lip shape is extracted from the final hue field with good quality results in this challenging situation.
|
MMSP-3.3
|
Automatic Snakes For Robust Lip Boundaries Extraction
Patrice Delmas,
Pierre-Yves Coulon,
Vincent Fristot (LIS-INPG)
Active contours or snakes are widely used
in object segmentation for their ability to integrate
features extraction and pixel candidate linking in
a single energy minimizing process.
But the sensitivity to parameters values and
initialization is also a widely known problem.
Performance of snakes can be enhanced by better
initialization close to the desired solution.
We present here a fine mouth region of interest (ROI)
extraction using gray level image and corresponding
gradient informations.
We link this technique with an original snake method.
The Automatic Snakes use spatially varying
coefficients to remain along its evolution in a mouth-like
shape. Our experimentations on a large image base proove
its robustness regarding speakers change of the ROI mouth
extraction and automatic snakes algorithms. The main
application of our algorithms is video-conferencing.
|
MMSP-3.4
|
Dynamic Hand Gesture Understanding - A New Approach
Mohammed (Electro-Technical Laboratory),
Subhasis Chaudhuri (Indian Institute of Technology, Bombay)
Analysis of a dynamic hand gesture requires processing a spatio-temporal
image sequence. The actual length of the sequence varies with each
instantiation of the gesture. We propose a novel, vision based
system for automatic interpretation of a limited set of dynamic hand gestures.
This involves extracting the temporal signature of the hand motion from the
performed gesture and is subsequently analyzed by a finite state machine to
automatically interpret the performed gesture.
|
MMSP-3.5
|
Non-Minimum Phase Inverse Filter Methods for Immersive Audio Rendering
Athanasios Mouchtaris (USC Integrated Media Systems Center),
Panagiotis Reveliotis (),
Chris Kyriakakis (USC Integrated Media Systems Center)
Immersive audio systems are being envisioned for applications that include teleconferencing and telepresence; augmented and virtual reality for manufacturing and entertainment; air traffic control, pilot warning, and guidance systems; displays for the visually-impaired; distance learning; and professional sound and picture editing for television and film. The principal function of such systems is to synthesize, manipulate and render sound fields in real time. In this paper we examine several signal processing considerations in spatial sound rendering over loudspeakers. We propose two methods that can be used to implement the necessary filters for generating virtual sound sources based on synthetic head-related transfer functions with the same spectral characteristics as those of the real source.
|
MMSP-3.6
|
CLASSIFICATION OF TIME DELAY ESTIMATES FOR ROBUST SPEAKER LOCALIZATION
Norbert K Strobel,
Rudolf Rabenstein (University of Erlangen-Nuernberg, Telecommunications Laboratory)
This paper proposes a solution to the problem of
robust speaker localization under adverse acoustic conditions. The
approach is based on the classification of time delay estimates.
Two classification techniques are investigated in detail:
maximum likelihood (ML) classification and classification based on
histogram comparison.
Their performance under adverse acoustic conditions is compared to
outcomes obtained with the traditional approach which uses time
delay estimates directly to infer speaker positions.
Experiments indicate that the ML classification
method provides little improvement over the traditional method.
On the other hand, using histogram classification,
we can improve the probability of correct speaker localization by more than $60\%$
compared to either the traditional approach or the ML classification technique.
|
MMSP-3.7
|
Alpha-Stable Robust Modeling of Background Noise for Enhanced Sound Source Localization
Panayiotis G Georgiou (USC Integrated Media Systems Center),
Panagiotis Tsakalides (USC Signal and Image Processing Institute),
Chris Kyriakakis (USC Integrated Media Systems Center)
In this paper we address the problem of sound source localization in the presence of impulsive noise for application in immersive telepresence and teleconferencing. Traditional Gaussian modeling of noise signals fails when the signals exhibit impulsive behavior. A new model is used, namely the Symmetric alpha-Stable (SaS), which can better account for the outliers that exist in real-world signals. Real data is used to compare the performance of both the Gaussian and the alpha-stable models. We demonstrate that the alpha-stable model gives a much better approximation to the noise signal than the Gaussian model.
Furthermore, we study the problem of Time Delay Estimation (TDE) and we demonstrate the shortcomings of TDE techniques based on second-order statistics when the noise is of SaS nature. We propose an alternative to second-order based methods, based on Fractional Lower-Order Statistics, and demonstrate the achieved improvement via simulation experiments.
|
MMSP-3.8
|
Sound Onset Detection by Applying Psychoacoustic Knowledge
Anssi P Klapuri (Signal Processing Laboratory of the Tampere University of Technology, Finland)
A system was designed, which is able to detect the
perceptual onsets of sounds in acoustic signals.
The system is general in regard to the sounds involved
and was found to be robust for different kinds of
signals. This was achieved without assuming regularities
in the positions of the onsets. In this paper,
a method is first proposed that can determine the
beginnings of sounds that exhibit onset imperfections,
i.e., the amplitude envelope of which does not rise
monothinically. Then we describe the mentioned system,
which utilizes band-wise processing and a psychoacoustic
model of intensity coding to combining the results
from the separate frequency bands. The performance
of the system was validated by applying it to the
detection of onsets in musical signals ranging
from rock music to classical and big band recordings.
|
MMSP-3.9
|
The Developmental Approach to Multimedia Speech Learning
Juyang Weng (Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA),
Yong-Beom Lee (Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA),
Colin H. Evans (Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA)
This paper introduces the developmental approach to speech
learning, motivated by human cognitive development from
infancy to adulthood. Central in the developmental approach
is what is called the developmental algorithm.
We introduce AA-learning as a basic learning mode for our developmental
algorithm. The developmental algorithm enables the system to
learn new tasks without a need for reprogramming.
Some experimental results for AA-learning using our developmental
algorithm are presented.
|
MMSP-3.10
|
An Adaptive Predictor for Media Playout Buffering
Phillip L DeLeon (New Mexico State University),
Cormac J Sreenan (AT&T Labs - Research)
Receiver playout buffers are required to smooth
network delay variations for multimedia streams.
Playout buffer algorithms such as those commonly
used in the Internet, autoregressively measure the
network delay and variation and adjust the buffer
delay accordingly, to avoid packets arriving too
late. In this work, we attempt to adjust the buffer
delay based on a "prediction" of the network delay
and a similar measure of variation. The philosophy
here is that the use of an accurate prediction will
adjust the buffer delay more effectively by tracking
rapid fluctuations more accurately. Proper buffer
delay can lead to either (or both) a lower total
end-to-end delay for a fixed packet lateness percentage
or fewer late packets for a fixed total end-to-end
delay which are both important metrics for applications
such as IP telephony. We present a playout algorithm
based on a simple normalized least-mean-square (NLMS)
adaptive predictor and demonstrate using Internet
packet traces that it can yield reductions in average
total end-to-end delays.
|
|