ICASSP99 Advanced Techniques in Multimedia

Advanced Techniques in Multimedia
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Robust Speaker Verification via Fusion of Speech and Lip Modalities Authors: Timothy J Wark, Sridha Sridharan, Vinod Chandran, Page (NA) Paper number 1839 Abstract: This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise. IC991839.PDF (From Author) IC991839.PDF (Rasterized) TOP Unsupervised Lip Segmentation Under Natural Conditions Authors: Marc Liévin, Signal And Image Laboratory, LIS-INPG, France (France) Franck Luthon, Signal And Image Laboratory, LIS-INPG, France (France) Page (NA) Paper number 2303 Abstract: An unsupervised algorithm for speaker's lip segmentation is presented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space and sequence dependant parameters are evaluated. Second, a statistical approach using Markov random field modeling segment mouth shape using red hue predominant region and motion in a spatiotemporal neighborhood. Simultaneously, a Region Of Interest (ROI) is automatically extracted. Third, the speaker's lip shape is extracted from the final hue field with good quality results in this challenging situation. IC992303.PDF (From Author) IC992303.PDF (Rasterized) TOP Automatic Snakes For Robust Lip Boundaries Extraction Authors: Patrice Delmas, Pierre-Yves Coulon, Vincent Fristot, Page (NA) Paper number 2312 Abstract: Active contours or snakes are widely used in object segmentation for their ability to integrate features extraction and pixel candidate linking in a single energy minimizing process. But the sensitivity to parameters values and initialization is also a widely known problem. Performance of snakes can be enhanced by better initialization close to the desired solution. We present here a fine mouth region of interest (ROI) extraction using gray level image and corresponding gradient informations. We link this technique with an original snake method. The Automatic Snakes use spatially varying coefficients to remain along its evolution in a mouth-like shape. Our experimentations on a large image base proove its robustness regarding speakers change of the ROI mouth extraction and automatic snakes algorithms. The main application of our algorithms is video-conferencing. IC992312.PDF (From Author) IC992312.PDF (Rasterized) TOP Dynamic Hand Gesture Understanding - A New Approach Authors: Mohammed Yeasin, Subhasis Chaudhuri, Indian Institute of Technology, Bombay (India) Page (NA) Paper number 1218 Abstract: Analysis of a dynamic hand gesture requires processing a spatio-temporal image sequence. The actual length of the sequence varies with each instantiation of the gesture. We propose a novel, vision based system for automatic interpretation of a limited set of dynamic hand gestures. This involves extracting the temporal signature of the hand motion from the performed gesture and is subsequently analyzed by a finite state machine to automatically interpret the performed gesture. IC991218.PDF (Scanned) TOP Non-Minimum Phase Inverse Filter Methods for Immersive Audio Rendering Authors: Athanasios Mouchtaris, Panagiotis Reveliotis, Chris Kyriakakis, Page (NA) Paper number 1799 Abstract: Immersive audio systems are being envisioned for applications that include teleconferencing and telepresence; augmented and virtual reality for manufacturing and entertainment; air traffic control, pilot warning, and guidance systems; displays for the visually-impaired; distance learning; and professional sound and picture editing for television and film. The principal function of such systems is to synthesize, manipulate and render sound fields in real time. In this paper we examine several signal processing considerations in spatial sound rendering over loudspeakers. We propose two methods that can be used to implement the necessary filters for generating virtual sound sources based on synthetic head-related transfer functions with the same spectral characteristics as those of the real source. IC991799.PDF (From Author) IC991799.PDF (Rasterized) TOP Classification Of Time Delay Estimates For Robust Speaker Localization Authors: Norbert K Strobel, Rudolf Rabenstein, Page (NA) Paper number 1651 Abstract: This paper proposes a solution to the problem of robust speaker localization under adverse acoustic conditions. The approach is based on the classification of time delay estimates. Two classification techniques are investigated in detail: maximum likelihood (ML) classification and classification based on histogram comparison. Their performance under adverse acoustic conditions is compared to outcomes obtained with the traditional approach which uses time delay estimates directly to infer speaker positions. Experiments indicate that the ML classification method provides little improvement over the traditional method. On the other hand, using histogram classification, we can improve the probability of correct speaker localization by more than 60% compared to either the traditional approach or the ML classification technique. IC991651.PDF (From Author) IC991651.PDF (Rasterized) TOP Alpha-Stable Robust Modeling of Background Noise for Enhanced Sound Source Localization Authors: Panayiotis G Georgiou, Panagiotis Tsakalides, Chris Kyriakakis, Page (NA) Paper number 1817 Abstract: In this paper we address the problem of sound source localization in the presence of impulsive noise for application in immersive telepresence and teleconferencing. Traditional Gaussian modeling of noise signals fails when the signals exhibit impulsive behavior. A new model is used, namely the Symmetric alpha-Stable (SaS), which can better account for the outliers that exist in real-world signals. Real data is used to compare the performance of both the Gaussian and the alpha-stable models. We demonstrate that the alpha-stable model gives a much better approximation to the noise signal than the Gaussian model. Furthermore, we study the problem of Time Delay Estimation (TDE) and we demonstrate the shortcomings of TDE techniques based on second-order statistics when the noise is of SaS nature. We propose an alternative to second-order based methods, based on Fractional Lower-Order Statistics, and demonstrate the achieved improvement via simulation experiments. IC991817.PDF (From Author) IC991817.PDF (Rasterized) TOP Sound Onset Detection by Applying Psychoacoustic Knowledge Authors: Anssi P Klapuri, Signal Processing Laboratory of the Tampere University of Technology, Finland (Finland) Page (NA) Paper number 1334 Abstract: A system was designed, which is able to detect the perceptual onsets of sounds in acoustic signals. The system is general in regard to the sounds involved and was found to be robust for different kinds of signals. This was achieved without assuming regularities in the positions of the onsets. In this paper, a method is first proposed that can determine the beginnings of sounds that exhibit onset imperfections, i.e., the amplitude envelope of which does not rise monothinically. Then we describe the mentioned system, which utilizes band-wise processing and a psychoacoustic model of intensity coding to combining the results from the separate frequency bands. The performance of the system was validated by applying it to the detection of onsets in musical signals ranging from rock music to classical and big band recordings. IC991334.PDF (From Author) IC991334.PDF (Rasterized) TOP The Developmental Approach to Multimedia Speech Learning Authors: Juyang Weng, Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA (USA) Yong-Beom Lee, Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824 USA (USA) Colin H. Evans, Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA (USA) Page (NA) Paper number 2205 Abstract: This paper introduces the developmental approach to speech learning, motivated by human cognitive development from infancy to adulthood. Central in the developmental approach is what is called the developmental algorithm. We introduce AA-learning as a basic learning mode for our developmental algorithm. The developmental algorithm enables the system to learn new tasks without a need for reprogramming. Some experimental results for AA-learning using our developmental algorithm are presented. IC992205.PDF (From Author) IC992205.PDF (Rasterized) TOP An Adaptive Predictor for Media Playout Buffering Authors: Phillip L DeLeon, New Mexico State University (Mexico) Cormac J Sreenan, Page (NA) Paper number 1487 Abstract: Receiver playout buffers are required to smooth network delay variations for multimedia streams. Playout buffer algorithms such as those commonly used in the Internet, autoregressively measure the network delay and variation and adjust the buffer delay accordingly, to avoid packets arriving too late. In this work, we attempt to adjust the buffer delay based on a "prediction" of the network delay and a similar measure of variation. The philosophy here is that the use of an accurate prediction will adjust the buffer delay more effectively by tracking rapid fluctuations more accurately. Proper buffer delay can lead to either (or both) a lower total end-to-end delay for a fixed packet lateness percentage or fewer late packets for a fixed total end-to-end delay which are both important metrics for applications such as IP telephony. We present a playout algorithm based on a simple normalized least-mean-square (NLMS) adaptive predictor and demonstrate using Internet packet traces that it can yield reductions in average total end-to-end delays. IC991487.PDF (From Author) IC991487.PDF (Rasterized) TOP