Audio and Video Processing for Multimedia Applications

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Automated Generation of News Content Hierarchy By Integrating Audio, Video, and Text Information

Authors:

Qian Huang,
Zhu Liu,
Aaron E Rosenberg,
David Gibbon,
Behzad Shahraray,

Page (NA) Paper number 2373

Abstract:

This paper addresses the problem of generating semantically meaningful content by integrating information from different media. The goal is to automatically contruct a compact yet meaningful abstraction of the multimedia data that can serve as an effective index table, allowing users browsing through large amounts of data in a non-linear fashion with flexibility, efficiency, and confidence. We propose an integrated solution in the context of news broadcast that simultaneously utilizes cues from video, audio, and text to achieve the goal. Some experimental results are presented and discussed in the paper.

IC992373.PDF (From Author) IC992373.PDF (Rasterized)

TOP


Finding Presentations in Recorded Meetings Using Audio and Video Features

Authors:

Jonathan Foote,
John Boreczky,
Lynn Wilcox,

Page (NA) Paper number 1490

Abstract:

This paper describes a method for finding segments in video-recorded meetings that correspond to presentations. These segments serve as indexes into the recorded meeting. The system automatically detects intervals of video that correspond to presentation slides. We assume that only one person speaks during an interval when slides are detected. Thus these intervals can be used as training data for a speaker spotting system. An HMM is automatically constructed and trained on the audio data from each slide interval. A Viterbi alignment then resegments the audio according to speaker. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. This allows the individual presentations in the video to be identified from the location of each presenter's speech. Results are presented for a corpus of six meeting videos.

IC991490.PDF (From Author) IC991490.PDF (Rasterized)

TOP


Video Content Extraction and Representation Using a Joint Audio and Video Processing

Authors:

Caterina Saraceno,

Page (NA) Paper number 1685

Abstract:

Computer technology allows for large collections of digital archived material. At the same time, the increasing availability of potentially interesting data makes difficult the retrieval of desired information. Currently, access to such information is limited to textual queries or characteristics such as color or texture. The demand for new solutions allowing common users to easily access, store and retrieve relevant audio-visual information is becoming urgent. One possible solution to this problem is to hierarchically organize the audio-visual data so as to create a nested indexing structure which provides efficient access to relevant information at each level of the hierarchy. This work presents an automatic methodology to extract and hierarchically represent the semantic of the contents, based on a joint audio and visual analysis. Descriptions on each media (audio, video) will be used to recognize higher level of meaningful structures, such as specific types of scenes, or, at the highest level, correlations beyond the temporal organization of information, allowing to reflect classes of visual or audio or audio-visual types. Once a hierarchy is extracted from the data analysis, a nested indexing structure can be created to access relevant information at a specific level of detail, according to the user requirements.

IC991685.PDF (From Author) IC991685.PDF (Rasterized)

TOP


Unsupervised Clustering of Ambulatory Audio and Video

Authors:

Brian P Clarkson,
Alex Pentland,

Page (NA) Paper number 2385

Abstract:

A truly personal and reactive computer system should have access to the same information as its user, including the ambient sights and sounds. To this end, we have developed a system for extracting events and scenes from natural audio/visual input. We find our system can (without any prior labeling of data) cluster the audio/visual data into events, such as passing through doors and crossing the street. Also, we hierarchically cluster these events into scenes and get clusters that correlate with visiting the supermarket, or walking down a busy street.

IC992385.PDF (From Author) IC992385.PDF (Rasterized)

TOP


Summarizing Video using a Shot Importance Measure and a Frame-Packing Algorithm

Authors:

Shingo Uchihashi,
Jonathan Foote,

Page (NA) Paper number 1494

Abstract:

This paper presents methods of generating compact pictorial summarizations of video. By calculating a measure of shot importance video can be summarized by de-emphasizing or discarding less important information, such as repeated or common scenes. In contrast to other approaches that present keyframes for each shot, this measure allows summarization by presenting only the most important shots. Selected keyframes can also be resized depending on their relative importance. We present an efficient packing algorithm that constructs a pictorial representation from differently-sized keyframes. This results in a compact and visually pleasing summary reminiscent of a comic book.

IC991494.PDF (From Author) IC991494.PDF (Rasterized)

TOP


Video Classification Using Transform Coefficients

Authors:

Andreas Girgensohn,
Jonathan Foote,

Page (NA) Paper number 1492

Abstract:

This paper describes techniques for classifying video frames using statistical models of reduced DCT or Hadamard transform coefficients. When decimated in time and reduced using truncation or principal component analysis, transform coefficients taken across an entire frame image allow rapid modeling, segmentation, and similarity calculation. Unlike color-histogram metrics, this approach models image composition and works on grayscale images. Modeling the statistics of the transformed video frame images gives a likelihood measure that allows video to be segmented, classified, and ranked by similarity for retrieval. Experiments are presented that show an 87% correct classification rate for different classes. Applications are presented including a content-aware video browser.

IC991492.PDF (From Author) IC991492.PDF (Rasterized)

TOP


On the Choice of Transforms for Data Hiding in Compressed Video

Authors:

Mahalingam Ramkumar,
Ali N Akansu,
Aydin A Alatan,

Page (NA) Paper number 2460

Abstract:

We present an information-theoretic approach to obtain an estimate of the number of bits that can be hidden in compressed video. We show how embedding the message signal in a suitable transform domain rather than the spatial domain can significantly increase the data hiding capacity. We compare the data hiding capacities achievable for different block transforms and show that the choice of the transform depends on the robustness required. While it is better to choose transforms with good energy compaction property (like DCT, Wavelet etc.) when the robustness requirement is low, transforms with poorer energy compaction property (like Hadamard or Hartley transform) are preferable choices for higher robustness.

IC992460.PDF (From Author) IC992460.PDF (Rasterized)

TOP


V2ID: Virtual Visual Interior Design System

Authors:

Zhibin Lei,
Yufeng Liang,
Weicong Wang,

Page (NA) Paper number 2113

Abstract:

In this paper we propose a novel system of semantic feature extraction and retrieval for interior design and decoration application. The system, V^2ID (Virtual Visual Interior Design), uses colored texture and spatial edge layout to obtain simple information about global room environment. We address the domain specific segmentation problem in our application and the techniques for obtaining semantic features from a room environment. We also discuss heuristics for making use of these features (color, texture or shape) to retrieve objects from an existing database. The final resynthesized room environment with original scene and novel object from database is created for the purpose of animation and virtual room walkthrough.

IC992113.PDF (From Author) IC992113.PDF (Rasterized)

TOP


Simulating MPEG-2 Transport Stream Transmission over Wireless ATM

Authors:

Andreas Kassler, University of Ulm, Dept. Distributed Systems, Oberer Eselsberg, 89069 Ulm, Germany (Germany)
Oliver Schirpf, University of Ulm, Dept. Distributed Systems, Oberer Eselsberg, 89069 Ulm, Germany (Germany)
Peter Schulthess, University of Ulm, Dept. Distributed Systems, Oberer Eselsberg, 89069 Ulm, Germany (Germany)

Page (NA) Paper number 1380

Abstract:

Within this paper we simulate the transmission of MPEG-2 Transport Stream (TS) Packets over a wireless ATM network. Based on a finite state radio channel model for the physical layer of a wireless ATM link including the characteristics of the ATM and MAC layer, different packing schemes are evaluated for encapsulating MPEG-2 Transport Stream packets in ATM Adaptation Layer 5 (AAL5) PDUs. We analyze the performance with respect to both delay and visual quality in terms of PSNR based on a calculated cell error ratio (CER) for each given state of the radio model. The statistics show, that the 1TP (one MPEG-2 TS per AAL5 PDU) scheme outperforms all other packing schemes in terms of visual quality. At medium channel quality (38 dB), quality is judged to be good, although the CER may be as high as 25 %.

IC991380.PDF (From Author) IC991380.PDF (Rasterized)

TOP