MMSP-1.1

Time-Series Active Search for Quick Retrieval of Audio and Video
Kunio Kashino, Gavin Smith, Hiroshi Murase (NTT Basic Research Laboratories)

This paper proposes a search method that can quickly detect and locate known sound (video) in a long audio (video) stream. The method is based on active search. Active search reduces the number of candidate matches between reference and input signals by approximately 10 to 100 times compared to exhaustive search, while guaranteeing the same retrieval accuracy. We proposed a quick search method in our previous paper, and here we focus on improvement of the accuracy. Thus the feature used has been extended to the audio power spectrum and temporal division of the histogram windows has been introduced to incorporate time information. Tests carried out under practical circumstances clearly show the accuracy improvement. The proposed method is still so fast that it can correctly retrieve a 15-s commercial in a 6-h recording of TV broadcasting within 2 s, once the features are calculated.

MMSP-1.2

Content-Based Video Indexing of TV Broadcast News Using Hidden Markov Models
Stefan Eickeler, Stefan Mueller (Gerhard-Mercator-University Duisburg)

This paper presents a new approach to content-based video indexing using Hidden Markov Models (HMMs). In this approach one feature vector is calculated for each image of the video sequence. These feature vectors are modeled and classified using HMMs. This approach has many advantages compared to other video indexing approaches. The system has automatic learning capabilities. It is trained by presenting manually indexed video sequences. To improve the system we use a video model, that allows the classification of complex video sequences. The presented approach works three times faster than real-time. We tested our system on TV broadcast news. The rate of 97.3% correctly classified frames shows the efficiency of our system.

MMSP-1.3

Hierarchical Classification of Audio Data for Archiving and Retrieving
Tong Zhang, C.-C. Jay Kuo (Integrated Media Systems Center and Department of Electrical Engineering-Systems, University of Southern California)

A hierarchical system for audio classification and retrieval based on audio content analysis is presented in this paper. The system consists of three stages. The first stage is called the coarse-level audio segmentation and classification, where audio recordings are segmented and classified into speech, music, several types of environmental sounds, and silence, based on morphological and statistical analysis of temporal curves of short-time features of audio signals. In the second stage, environmental sounds are further classified into finer classes such as applause, rain, birds' sound, etc. This fine-level classification is based on time-frequency analysis of audio signals and use of the hidden Markov model (HMM) for classification. In the third stage, the query-by-example audio retrieval is implemented where similar sounds can be found according to an input sample audio. It is shown that the proposed system has achieved an accuracy higher than 90% for coarse-level audio classification. Examples of audio fine classification and audio retrieval are also provided.

MMSP-1.4

A Fast Audio Classification from MPEG Coded Data
Yasuyuki Nakajima (KDD R&D Labs.), Yang Lu (University of Electro-Communications), Masaru Sugano, Akio Yoneyama (KDD R&D Labs.), Hiromasa Yanagihara (KDD R&D Lbas.), Akira Kurematsu (University of Electro-Communications)

Audio information classification becomes a very important task for such purposes as automatic keyword spotting and other content-based audio-visual query system. In this paper, we describe a fast and accurate audio data classification method on MPEG coded data domain. Firstly silent segments are detected using a robust approach for different recording conditions. Then the non-silent segments are classified into three types, music, speech, and applause using temporal density, bandwidth and center frequency of subband energy. In order to be robust for a variety of audio sources as much as possible, we use Bayes discriminant function for multivariate Gaussian distribution instead of manually adjusting a threshold for each discriminator. In the experiment, every one-second MPEG audio data is classified and about 90% of audio and speech segments have been successfully detected. As for the detection speed, less than 20% of MPEG audio decoding processing power is required.

MMSP-1.5

Image Retrieval Based on Energy Histograms of The Low Frequency DCT Coefficients
Jose A Lay, Ling Guan (School of Electrical and Information Engineering, University of Sydney)

With the increasing popularity of the use of compressed images, an intuitive approach for lowering computational complexity towards a practically efficient image retrieval system is to propose a scheme that is able to perform retrieval computation directly in the compressed domain. In this paper, we investigate the use of energy histograms of the low frequency DCT coefficients as features for the retrieval of DCT compressed images. We propose a feature set that is able to identify similarities on changes of image-representation due to several lossless DCT transformations. We then use the features to construct an image retrieval system based on the real-time image retrieval model. We observe that the proposed features are sufficient for performing high level retrieval on medium size image databases. And by introducing transpositional symmetry, the features can be brought to accommodate several lossless DCT transformations such as horizontal and vertical mirroring, rotating, transposing, and transversing.

MMSP-1.6

Texture Features for DCT-Coded Image Retrieval and Classification
Yu-Len Huang, Ruey-Feng Chang (Department of Computer Science and Information Engineering, National Chung Cheng University, Taiwan, R.O.C.)

The multiresolution wavelet transform has been shown to be an effective technique and achieved very good performance for texture analysis. However, a large number of images are compressed by the methods based on discrete cosine transform (DCT). Hence, the image decompression of inverse DCT is needed to obtain the texture features based on the wavelet transform for the DCT-coded image. This paper proposes the use of the multiresolution reordered features for texture analysis. The proposed features are directly generated by using the DCT coefficients from the DCT-coded image. Comparisons with the subband-energy features extracted from the wavelet transform, conventional DCT using the Brodatz texture database indicate that the proposed method provides the best texture pattern retrieval accuracy and obtains much better correct classification rate. The proposed DCT based features are expected to be very useful and efficient for texture pattern retrieval and classification in large DCT-coded image databases. The detail simulation results can be found in web page: http://www.cs.ccu.edu.tw/~hyl/mrdct/.

MMSP-1.7

An Efficient Low-Dimensional Color Indexing Scheme for Region-Based Image Retrieval
Yining Deng, B. S. Manjunath (ECE Dept., Univ. of California, Santa Barbara)

In this work, an efficient low-dimensional color indexing scheme for region-based image retrieval is presented. The colors in each image region are first quantized so that only a small number of cluster centroids are needed to represent the region color information. The proposed color feature descriptor consists of these quantized colors and their percentages in the region. A similarity distance measure is defined and shown to be equivalent to the quadratic color histogram distance measure. The quantized colors are indexed in the 3-D color space so that high-dimensional indexing can be avoided. During the search process, each quantized color in the query is used as a separate cue to find matches containing that color. The matches from all the query colors are then joined to obtain the final retrievals. Experimental results show that the proposed scheme is fast and accurate compared to the color histogram approach.

MMSP-1.8

Vector-Wavelet Based Scalable Indexing And Retrieval Systems For Large Color Image Archives
Elif Albuz, Erturk D Kocalar, Ashfaq A Khokhar (University of Delaware)

This paper presents an efficient content based indexing and retrieval mechanism based on vector wavelet coefficients of color images. We use highly decorrelated wavelet coefficient planes to acquire a search efficient feature space. The feature space is subsequently indexed using properties of the all the images in the database. Therefore the feature key of an image does not only correspond to the content of the image itself but also how much the image is different from the other images being stored in the database. The search time depends only on the number of images similar to the query image but not on the size of the entire database. The system is scalable and provides fast retrievals. We show that in a database of 1000 images, query search takes less than 50 msec, on a 266 MHz Pentium processor compared to several seconds of retrieval time in the earlier systems proposed in the literature.

MMSP-2 >

Last Update: February 4, 1999 Ingo Höntsch