Advanced Techniques in Multimedia

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Robust Speaker Verification via Fusion of Speech and Lip Modalities

Authors:

Timothy J Wark,
Sridha Sridharan,
Vinod Chandran,

Page (NA) Paper number 1839

Abstract:

This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise.

IC991839.PDF (From Author) IC991839.PDF (Rasterized)

TOP


Unsupervised Lip Segmentation Under Natural Conditions

Authors:

Marc Liévin, Signal And Image Laboratory, LIS-INPG, France (France)
Franck Luthon, Signal And Image Laboratory, LIS-INPG, France (France)

Page (NA) Paper number 2303

Abstract:

An unsupervised algorithm for speaker's lip segmentation is presented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space and sequence dependant parameters are evaluated. Second, a statistical approach using Markov random field modeling segment mouth shape using red hue predominant region and motion in a spatiotemporal neighborhood. Simultaneously, a Region Of Interest (ROI) is automatically extracted. Third, the speaker's lip shape is extracted from the final hue field with good quality results in this challenging situation.

IC992303.PDF (From Author) IC992303.PDF (Rasterized)

TOP


Automatic Snakes For Robust Lip Boundaries Extraction

Authors:

Patrice Delmas,
Pierre-Yves Coulon,
Vincent Fristot,

Page (NA) Paper number 2312

Abstract:

Active contours or snakes are widely used in object segmentation for their ability to integrate features extraction and pixel candidate linking in a single energy minimizing process. But the sensitivity to parameters values and initialization is also a widely known problem. Performance of snakes can be enhanced by better initialization close to the desired solution. We present here a fine mouth region of interest (ROI) extraction using gray level image and corresponding gradient informations. We link this technique with an original snake method. The Automatic Snakes use spatially varying coefficients to remain along its evolution in a mouth-like shape. Our experimentations on a large image base proove its robustness regarding speakers change of the ROI mouth extraction and automatic snakes algorithms. The main application of our algorithms is video-conferencing.

IC992312.PDF (From Author) IC992312.PDF (Rasterized)

TOP


Dynamic Hand Gesture Understanding - A New Approach

Authors:

Mohammed Yeasin,
Subhasis Chaudhuri, Indian Institute of Technology, Bombay (India)

Page (NA) Paper number 1218

Abstract:

Analysis of a dynamic hand gesture requires processing a spatio-temporal image sequence. The actual length of the sequence varies with each instantiation of the gesture. We propose a novel, vision based system for automatic interpretation of a limited set of dynamic hand gestures. This involves extracting the temporal signature of the hand motion from the performed gesture and is subsequently analyzed by a finite state machine to automatically interpret the performed gesture.

IC991218.PDF (Scanned)

TOP


Non-Minimum Phase Inverse Filter Methods for Immersive Audio Rendering

Authors:

Athanasios Mouchtaris,
Panagiotis Reveliotis,
Chris Kyriakakis,

Page (NA) Paper number 1799

Abstract:

Immersive audio systems are being envisioned for applications that include teleconferencing and telepresence; augmented and virtual reality for manufacturing and entertainment; air traffic control, pilot warning, and guidance systems; displays for the visually-impaired; distance learning; and professional sound and picture editing for television and film. The principal function of such systems is to synthesize, manipulate and render sound fields in real time. In this paper we examine several signal processing considerations in spatial sound rendering over loudspeakers. We propose two methods that can be used to implement the necessary filters for generating virtual sound sources based on synthetic head-related transfer functions with the same spectral characteristics as those of the real source.

IC991799.PDF (From Author) IC991799.PDF (Rasterized)

TOP


Classification Of Time Delay Estimates For Robust Speaker Localization

Authors:

Norbert K Strobel,
Rudolf Rabenstein,

Page (NA) Paper number 1651

Abstract:

This paper proposes a solution to the problem of robust speaker localization under adverse acoustic conditions. The approach is based on the classification of time delay estimates. Two classification techniques are investigated in detail: maximum likelihood (ML) classification and classification based on histogram comparison. Their performance under adverse acoustic conditions is compared to outcomes obtained with the traditional approach which uses time delay estimates directly to infer speaker positions. Experiments indicate that the ML classification method provides little improvement over the traditional method. On the other hand, using histogram classification, we can improve the probability of correct speaker localization by more than 60% compared to either the traditional approach or the ML classification technique.

IC991651.PDF (From Author) IC991651.PDF (Rasterized)

TOP


Alpha-Stable Robust Modeling of Background Noise for Enhanced Sound Source Localization

Authors:

Panayiotis G Georgiou,
Panagiotis Tsakalides,
Chris Kyriakakis,

Page (NA) Paper number 1817

Abstract:

In this paper we address the problem of sound source localization in the presence of impulsive noise for application in immersive telepresence and teleconferencing. Traditional Gaussian modeling of noise signals fails when the signals exhibit impulsive behavior. A new model is used, namely the Symmetric alpha-Stable (SaS), which can better account for the outliers that exist in real-world signals. Real data is used to compare the performance of both the Gaussian and the alpha-stable models. We demonstrate that the alpha-stable model gives a much better approximation to the noise signal than the Gaussian model. Furthermore, we study the problem of Time Delay Estimation (TDE) and we demonstrate the shortcomings of TDE techniques based on second-order statistics when the noise is of SaS nature. We propose an alternative to second-order based methods, based on Fractional Lower-Order Statistics, and demonstrate the achieved improvement via simulation experiments.

IC991817.PDF (From Author) IC991817.PDF (Rasterized)

TOP


Sound Onset Detection by Applying Psychoacoustic Knowledge

Authors:

Anssi P Klapuri, Signal Processing Laboratory of the Tampere University of Technology, Finland (Finland)

Page (NA) Paper number 1334

Abstract:

A system was designed, which is able to detect the perceptual onsets of sounds in acoustic signals. The system is general in regard to the sounds involved and was found to be robust for different kinds of signals. This was achieved without assuming regularities in the positions of the onsets. In this paper, a method is first proposed that can determine the beginnings of sounds that exhibit onset imperfections, i.e., the amplitude envelope of which does not rise monothinically. Then we describe the mentioned system, which utilizes band-wise processing and a psychoacoustic model of intensity coding to combining the results from the separate frequency bands. The performance of the system was validated by applying it to the detection of onsets in musical signals ranging from rock music to classical and big band recordings.

IC991334.PDF (From Author) IC991334.PDF (Rasterized)

TOP


The Developmental Approach to Multimedia Speech Learning

Authors:

Juyang Weng, Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA (USA)
Yong-Beom Lee, Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA (USA)
Colin H. Evans, Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA (USA)

Page (NA) Paper number 2205

Abstract:

This paper introduces the developmental approach to speech learning, motivated by human cognitive development from infancy to adulthood. Central in the developmental approach is what is called the developmental algorithm. We introduce AA-learning as a basic learning mode for our developmental algorithm. The developmental algorithm enables the system to learn new tasks without a need for reprogramming. Some experimental results for AA-learning using our developmental algorithm are presented.

IC992205.PDF (From Author) IC992205.PDF (Rasterized)

TOP


An Adaptive Predictor for Media Playout Buffering

Authors:

Phillip L DeLeon, New Mexico State University (Mexico)
Cormac J Sreenan,

Page (NA) Paper number 1487

Abstract:

Receiver playout buffers are required to smooth network delay variations for multimedia streams. Playout buffer algorithms such as those commonly used in the Internet, autoregressively measure the network delay and variation and adjust the buffer delay accordingly, to avoid packets arriving too late. In this work, we attempt to adjust the buffer delay based on a "prediction" of the network delay and a similar measure of variation. The philosophy here is that the use of an accurate prediction will adjust the buffer delay more effectively by tracking rapid fluctuations more accurately. Proper buffer delay can lead to either (or both) a lower total end-to-end delay for a fixed packet lateness percentage or fewer late packets for a fixed total end-to-end delay which are both important metrics for applications such as IP telephony. We present a playout algorithm based on a simple normalized least-mean-square (NLMS) adaptive predictor and demonstrate using Internet packet traces that it can yield reductions in average total end-to-end delays.

IC991487.PDF (From Author) IC991487.PDF (Rasterized)

TOP