ICASSP99 Audio and Video Processing for Multimedia Applications

Audio and Video Processing for Multimedia Applications
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Automated Generation of News Content Hierarchy By Integrating Audio, Video, and Text Information Authors: Qian Huang, Zhu Liu, Aaron E Rosenberg, David Gibbon, Behzad Shahraray, Page (NA) Paper number 2373 Abstract: This paper addresses the problem of generating semantically meaningful content by integrating information from different media. The goal is to automatically contruct a compact yet meaningful abstraction of the multimedia data that can serve as an effective index table, allowing users browsing through large amounts of data in a non-linear fashion with flexibility, efficiency, and confidence. We propose an integrated solution in the context of news broadcast that simultaneously utilizes cues from video, audio, and text to achieve the goal. Some experimental results are presented and discussed in the paper. IC992373.PDF (From Author) IC992373.PDF (Rasterized) TOP Finding Presentations in Recorded Meetings Using Audio and Video Features Authors: Jonathan Foote, John Boreczky, Lynn Wilcox, Page (NA) Paper number 1490 Abstract: This paper describes a method for finding segments in video-recorded meetings that correspond to presentations. These segments serve as indexes into the recorded meeting. The system automatically detects intervals of video that correspond to presentation slides. We assume that only one person speaks during an interval when slides are detected. Thus these intervals can be used as training data for a speaker spotting system. An HMM is automatically constructed and trained on the audio data from each slide interval. A Viterbi alignment then resegments the audio according to speaker. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. This allows the individual presentations in the video to be identified from the location of each presenter's speech. Results are presented for a corpus of six meeting videos. IC991490.PDF (From Author) IC991490.PDF (Rasterized) TOP Video Content Extraction and Representation Using a Joint Audio and Video Processing Authors: Caterina Saraceno, Page (NA) Paper number 1685 Abstract: Computer technology allows for large collections of digital archived material. At the same time, the increasing availability of potentially interesting data makes difficult the retrieval of desired information. Currently, access to such information is limited to textual queries or characteristics such as color or texture. The demand for new solutions allowing common users to easily access, store and retrieve relevant audio-visual information is becoming urgent. One possible solution to this problem is to hierarchically organize the audio-visual data so as to create a nested indexing structure which provides efficient access to relevant information at each level of the hierarchy. This work presents an automatic methodology to extract and hierarchically represent the semantic of the contents, based on a joint audio and visual analysis. Descriptions on each media (audio, video) will be used to recognize higher level of meaningful structures, such as specific types of scenes, or, at the highest level, correlations beyond the temporal organization of information, allowing to reflect classes of visual or audio or audio-visual types. Once a hierarchy is extracted from the data analysis, a nested indexing structure can be created to access relevant information at a specific level of detail, according to the user requirements. IC991685.PDF (From Author) IC991685.PDF (Rasterized) TOP Unsupervised Clustering of Ambulatory Audio and Video Authors: Brian P Clarkson, Alex Pentland, Page (NA) Paper number 2385 Abstract: A truly personal and reactive computer system should have access to the same information as its user, including the ambient sights and sounds. To this end, we have developed a system for extracting events and scenes from natural audio/visual input. We find our system can (without any prior labeling of data) cluster the audio/visual data into events, such as passing through doors and crossing the street. Also, we hierarchically cluster these events into scenes and get clusters that correlate with visiting the supermarket, or walking down a busy street. IC992385.PDF (From Author) IC992385.PDF (Rasterized) TOP Summarizing Video using a Shot Importance Measure and a Frame-Packing Algorithm Authors: Shingo Uchihashi, Jonathan Foote, Page (NA) Paper number 1494 Abstract: This paper presents methods of generating compact pictorial summarizations of video. By calculating a measure of shot importance video can be summarized by de-emphasizing or discarding less important information, such as repeated or common scenes. In contrast to other approaches that present keyframes for each shot, this measure allows summarization by presenting only the most important shots. Selected keyframes can also be resized depending on their relative importance. We present an efficient packing algorithm that constructs a pictorial representation from differently-sized keyframes. This results in a compact and visually pleasing summary reminiscent of a comic book. IC991494.PDF (From Author) IC991494.PDF (Rasterized) TOP Video Classification Using Transform Coefficients Authors: Andreas Girgensohn, Jonathan Foote, Page (NA) Paper number 1492 Abstract: This paper describes techniques for classifying video frames using statistical models of reduced DCT or Hadamard transform coefficients. When decimated in time and reduced using truncation or principal component analysis, transform coefficients taken across an entire frame image allow rapid modeling, segmentation, and similarity calculation. Unlike color-histogram metrics, this approach models image composition and works on grayscale images. Modeling the statistics of the transformed video frame images gives a likelihood measure that allows video to be segmented, classified, and ranked by similarity for retrieval. Experiments are presented that show an 87% correct classification rate for different classes. Applications are presented including a content-aware video browser. IC991492.PDF (From Author) IC991492.PDF (Rasterized) TOP On the Choice of Transforms for Data Hiding in Compressed Video Authors: Mahalingam Ramkumar, Ali N Akansu, Aydin A Alatan, Page (NA) Paper number 2460 Abstract: We present an information-theoretic approach to obtain an estimate of the number of bits that can be hidden in compressed video. We show how embedding the message signal in a suitable transform domain rather than the spatial domain can significantly increase the data hiding capacity. We compare the data hiding capacities achievable for different block transforms and show that the choice of the transform depends on the robustness required. While it is better to choose transforms with good energy compaction property (like DCT, Wavelet etc.) when the robustness requirement is low, transforms with poorer energy compaction property (like Hadamard or Hartley transform) are preferable choices for higher robustness. IC992460.PDF (From Author) IC992460.PDF (Rasterized) TOP V2ID: Virtual Visual Interior Design System Authors: Zhibin Lei, Yufeng Liang, Weicong Wang, Page (NA) Paper number 2113 Abstract: In this paper we propose a novel system of semantic feature extraction and retrieval for interior design and decoration application. The system, V^2ID (Virtual Visual Interior Design), uses colored texture and spatial edge layout to obtain simple information about global room environment. We address the domain specific segmentation problem in our application and the techniques for obtaining semantic features from a room environment. We also discuss heuristics for making use of these features (color, texture or shape) to retrieve objects from an existing database. The final resynthesized room environment with original scene and novel object from database is created for the purpose of animation and virtual room walkthrough. IC992113.PDF (From Author) IC992113.PDF (Rasterized) TOP Simulating MPEG-2 Transport Stream Transmission over Wireless ATM Authors: Andreas Kassler, University of Ulm, Dept. Distributed Systems, Oberer Eselsberg, 89069 Ulm, Germany (Germany) Oliver Schirpf, University of Ulm, Dept. Distributed Systems, Oberer Eselsberg, 89069 Ulm, Germany (Germany) Peter Schulthess, University of Ulm, Dept. Distributed Systems, Oberer Eselsberg, 89069 Ulm, Germany (Germany) Page (NA) Paper number 1380 Abstract: Within this paper we simulate the transmission of MPEG-2 Transport Stream (TS) Packets over a wireless ATM network. Based on a finite state radio channel model for the physical layer of a wireless ATM link including the characteristics of the ATM and MAC layer, different packing schemes are evaluated for encapsulating MPEG-2 Transport Stream packets in ATM Adaptation Layer 5 (AAL5) PDUs. We analyze the performance with respect to both delay and visual quality in terms of PSNR based on a calculated cell error ratio (CER) for each given state of the radio model. The statistics show, that the 1TP (one MPEG-2 TS per AAL5 PDU) scheme outperforms all other packing schemes in terms of visual quality. At medium channel quality (38 dB), quality is judged to be good, although the CER may be as high as 25 %. IC991380.PDF (From Author) IC991380.PDF (Rasterized) TOP