SpacerHome

Spacer
Mirror Sites
Spacer
General Information
Spacer
Confernce Schedule
Spacer
Technical Program
Spacer
     Plenary Sessions
Spacer
     Special Sessions
Spacer
     Expert Summaries
Spacer
     Tutorials
Spacer
     Industry Technology Tracks
Spacer
     Technical Sessions
    
By Date
    March 16
    March 17
    March 18
    March 19
    
By Category
    AE     COMM
    DISPS     DSPE
    ESS     IMDSP
    ITT     MMSP
    NNSP     SAM
    SP     SPEC
    SPTM
    
By Author
        A    B    C    D   
        E    F    G    H   
        I    J    K    L   
        M    N    O    P   
        Q    R    S    T   
        U    V    W    X   
        Y    Z   
Spacer
Tutorials
Spacer
Industry Technology Tracks
Spacer
Exhibits
Spacer
Sponsors
Spacer
Registration
Spacer
Coming to Phoenix
Spacer
Call for Papers
Spacer
Author's Kit
Spacer
On-line Review
Spacer
Future Conferences
Spacer
Help

Abstract: Session MMSP-3

Conference Logo

MMSP-3.1  

PDF File of Paper Manuscript
Robust Speaker Verification via Fusion of Speech and Lip Modalities
Timothy J Wark, Sridha Sridharan, Vinod Chandran (Queensland Univerisity of Technology)

This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise.


MMSP-3.2  

PDF File of Paper Manuscript
Unsupervised Lip Segmentation Under Natural Conditions
Marc Lievin, Franck Luthon (Signal And Image Laboratory, LIS-INPG, France)

An unsupervised algorithm for speaker's lip segmentation is presented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space and sequence dependant parameters are evaluated. Second, a statistical approach using Markov random field modeling segment mouth shape using red hue predominant region and motion in a spatiotemporal neighborhood. Simultaneously, a Region Of Interest (ROI) is automatically extracted. Third, the speaker's lip shape is extracted from the final hue field with good quality results in this challenging situation.


MMSP-3.3  

PDF File of Paper Manuscript
Automatic Snakes For Robust Lip Boundaries Extraction
Patrice Delmas, Pierre-Yves Coulon, Vincent Fristot (LIS-INPG)

Active contours or snakes are widely used in object segmentation for their ability to integrate features extraction and pixel candidate linking in a single energy minimizing process. But the sensitivity to parameters values and initialization is also a widely known problem. Performance of snakes can be enhanced by better initialization close to the desired solution. We present here a fine mouth region of interest (ROI) extraction using gray level image and corresponding gradient informations. We link this technique with an original snake method. The Automatic Snakes use spatially varying coefficients to remain along its evolution in a mouth-like shape. Our experimentations on a large image base proove its robustness regarding speakers change of the ROI mouth extraction and automatic snakes algorithms. The main application of our algorithms is video-conferencing.


MMSP-3.4  

PDF File of Paper Manuscript
Dynamic Hand Gesture Understanding - A New Approach
Mohammed  (Electro-Technical Laboratory), Subhasis Chaudhuri (Indian Institute of Technology, Bombay)

Analysis of a dynamic hand gesture requires processing a spatio-temporal image sequence. The actual length of the sequence varies with each instantiation of the gesture. We propose a novel, vision based system for automatic interpretation of a limited set of dynamic hand gestures. This involves extracting the temporal signature of the hand motion from the performed gesture and is subsequently analyzed by a finite state machine to automatically interpret the performed gesture.


MMSP-3.5  

PDF File of Paper Manuscript
Non-Minimum Phase Inverse Filter Methods for Immersive Audio Rendering
Athanasios Mouchtaris (USC Integrated Media Systems Center), Panagiotis Reveliotis (), Chris Kyriakakis (USC Integrated Media Systems Center)

Immersive audio systems are being envisioned for applications that include teleconferencing and telepresence; augmented and virtual reality for manufacturing and entertainment; air traffic control, pilot warning, and guidance systems; displays for the visually-impaired; distance learning; and professional sound and picture editing for television and film. The principal function of such systems is to synthesize, manipulate and render sound fields in real time. In this paper we examine several signal processing considerations in spatial sound rendering over loudspeakers. We propose two methods that can be used to implement the necessary filters for generating virtual sound sources based on synthetic head-related transfer functions with the same spectral characteristics as those of the real source.


MMSP-3.6  

PDF File of Paper Manuscript
CLASSIFICATION OF TIME DELAY ESTIMATES FOR ROBUST SPEAKER LOCALIZATION
Norbert K Strobel, Rudolf Rabenstein (University of Erlangen-Nuernberg, Telecommunications Laboratory)

This paper proposes a solution to the problem of robust speaker localization under adverse acoustic conditions. The approach is based on the classification of time delay estimates. Two classification techniques are investigated in detail: maximum likelihood (ML) classification and classification based on histogram comparison. Their performance under adverse acoustic conditions is compared to outcomes obtained with the traditional approach which uses time delay estimates directly to infer speaker positions. Experiments indicate that the ML classification method provides little improvement over the traditional method. On the other hand, using histogram classification, we can improve the probability of correct speaker localization by more than $60\%$ compared to either the traditional approach or the ML classification technique.


MMSP-3.7  

PDF File of Paper Manuscript
Alpha-Stable Robust Modeling of Background Noise for Enhanced Sound Source Localization
Panayiotis G Georgiou (USC Integrated Media Systems Center), Panagiotis Tsakalides (USC Signal and Image Processing Institute), Chris Kyriakakis (USC Integrated Media Systems Center)

In this paper we address the problem of sound source localization in the presence of impulsive noise for application in immersive telepresence and teleconferencing. Traditional Gaussian modeling of noise signals fails when the signals exhibit impulsive behavior. A new model is used, namely the Symmetric alpha-Stable (SaS), which can better account for the outliers that exist in real-world signals. Real data is used to compare the performance of both the Gaussian and the alpha-stable models. We demonstrate that the alpha-stable model gives a much better approximation to the noise signal than the Gaussian model. Furthermore, we study the problem of Time Delay Estimation (TDE) and we demonstrate the shortcomings of TDE techniques based on second-order statistics when the noise is of SaS nature. We propose an alternative to second-order based methods, based on Fractional Lower-Order Statistics, and demonstrate the achieved improvement via simulation experiments.


MMSP-3.8  

PDF File of Paper Manuscript
Sound Onset Detection by Applying Psychoacoustic Knowledge
Anssi P Klapuri (Signal Processing Laboratory of the Tampere University of Technology, Finland)

A system was designed, which is able to detect the perceptual onsets of sounds in acoustic signals. The system is general in regard to the sounds involved and was found to be robust for different kinds of signals. This was achieved without assuming regularities in the positions of the onsets. In this paper, a method is first proposed that can determine the beginnings of sounds that exhibit onset imperfections, i.e., the amplitude envelope of which does not rise monothinically. Then we describe the mentioned system, which utilizes band-wise processing and a psychoacoustic model of intensity coding to combining the results from the separate frequency bands. The performance of the system was validated by applying it to the detection of onsets in musical signals ranging from rock music to classical and big band recordings.


MMSP-3.9  

PDF File of Paper Manuscript
The Developmental Approach to Multimedia Speech Learning
Juyang Weng (Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA), Yong-Beom Lee (Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA), Colin H. Evans (Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA)

This paper introduces the developmental approach to speech learning, motivated by human cognitive development from infancy to adulthood. Central in the developmental approach is what is called the developmental algorithm. We introduce AA-learning as a basic learning mode for our developmental algorithm. The developmental algorithm enables the system to learn new tasks without a need for reprogramming. Some experimental results for AA-learning using our developmental algorithm are presented.


MMSP-3.10  

PDF File of Paper Manuscript
An Adaptive Predictor for Media Playout Buffering
Phillip L DeLeon (New Mexico State University), Cormac J Sreenan (AT&T Labs - Research)

Receiver playout buffers are required to smooth network delay variations for multimedia streams. Playout buffer algorithms such as those commonly used in the Internet, autoregressively measure the network delay and variation and adjust the buffer delay accordingly, to avoid packets arriving too late. In this work, we attempt to adjust the buffer delay based on a "prediction" of the network delay and a similar measure of variation. The philosophy here is that the use of an accurate prediction will adjust the buffer delay more effectively by tracking rapid fluctuations more accurately. Proper buffer delay can lead to either (or both) a lower total end-to-end delay for a fixed packet lateness percentage or fewer late packets for a fixed total end-to-end delay which are both important metrics for applications such as IP telephony. We present a playout algorithm based on a simple normalized least-mean-square (NLMS) adaptive predictor and demonstrate using Internet packet traces that it can yield reductions in average total end-to-end delays.


MMSP-2


Last Update:  February 4, 1999         Ingo Höntsch
Return to Top of Page