9:30, MMEDIA-L3.1
AN AUDIO VIRTUAL DSP FOR MULTIMEDIA FRAMEWORKS
G. ZOIA, C. ALBERTI
The new MPEG-4 Audio standard provides two toolsets for synthetic Audio generation, Audio processing and multimedia content description called Structured Audio (SA) and BInary Format for Scenes (BIFS).
Moving from a systematic analysis of SA and from the implementation of an efficient SA decoder, this paper describes the design of a virtual DSP architecture able to exploit the data level parallelism contained in many typical audio processing algorithms. The proposed virtual DSP architecture shows good performance on general purpose platforms and can be easily adapted and optimized for parallel superscalar devices. The porting and results on a V-LIW DSP device confirm the effectiveness and flexibility of the approach, particularly suitable for standalone embedded solutions.
9:50, MMEDIA-L3.2
PERFORMANCE EVALUATION AND COMPARISON OF ITU-T/ETSI VOICE ACTIVITY DETECTORS
F. BERITELLI, S. CASALE, G. RUGGERI
The paper proposes a performance evaluation and comparison of
recent ITU-T and ETSI voice activity detection algorithms. The
comparison was made using both objective and psychoacoustic
parameters, so as to have reliable judgements that were close to
subjective ones. A highly varied speech database was also set up
to evaluate the extent to which VADs depend on language, the
signal to noise ratio, or the power level.
10:10, MMEDIA-L3.3
VERY QUICK AUDIO SEARCHING : INTRODUCING GLOBAL PRUNING TO THE TIME-SERIES ACTIVE SEARCH
A. KIMURA, K. KASHINO, T. KUROZUMI, H. MURASE
Previously, we proposed a histogram-based quick signal search method called
Time-Series Active Search (TAS). TAS is a method of searching through long audio or
video recordings for a specified segment, based on signal similarity. TAS is fast;
it can search through a 24-hour recording in 1 second after a query-independent
preprocessing. However, an even faster method is required when we consider huge amount
of audio archives, for example a month's worth of recordings. Thus, we propose a
preprocessing method that significantly accelerates TAS. The core part of this method
comprises a global histogram clustering of long signals and a pruning scheme using
those clusters. Tests using broadcast recording indicate that the proposed algorithm
achieves the search speed approximately 3 to 30 times faster than TAS under realistic
circumstances. The exactly same search results as TAS are theoretically guaranteed.
10:30, MMEDIA-L3.4
HIERARCHICAL ADAPTIVE REGULARISATION METHOD FOR DEPTH EXTRACTION FROM PLANAR RECORDING OF 3D-INTEGRAL IMAGES
S. MANOLACHE, M. MCCORMICK, S. KUNG
The paper presents a novel algorithm for object space reconstruction from the planar (2D) recorded data set of a 3D-integral image. The integral imaging system is described and the associated point spread function is given. The space data extraction is formulated as an
inverse problem, which proves ill-conditioned, and tackled by using a hierarchical multiresolution strategy and imposing additional conditions to the sought solution. The hierarchisation strategy and the two-phase adaptive constrained 3D-reconstruction algorithm based on the use of two sigmoid functions are presented. Finally, illustrative simulation results are given.
10:50, MMEDIA-L3.5
HIERARCHICAL SEGMENTATION USING LATENT SEMANTIC INDEXING IN SCALE SPACE
M. SLANEY, D. PONCELEON
This paper describes a new algorithm which discovers the hierarchical organization of a document or media presentation. We use latent semantic indexing to describe the semantic content of the signal, and scale-space segmentation to describe its features at many different scales. We present results from a text document and a video transcript.
11:10, MMEDIA-L3.6
THE CELLULAR TEXT TELEPHONE MODEM - THE SOLUTION FOR SUPPORTING TEXT TELEPHONE FUNCTIONALITY IN GSM NETWORKS
M. DOERBECKER, K. HELLWIG, F. JANSSON, T. FRANKKILA
Text telephone devices are text-based terminals that allow the users to communicate by text via fixed-line telephone networks. Since cellular phone systems are sometime subject to severe radio channel impairments and the modem signals of these text telephones are therefore not always transmitted reliably, the Federal Communications Commission (FCC) has required a solution to guarantee a reliable transmission of text telephone data for emergency calls via cellular phone systems. For the North American PCS-1900 cellular phone systems recently the Standards Committee T1 has standardized a solution for this requirement, which is based on a new modem protocol, the Cellular Text Telephone Modem (CTM), whose signals can be reliably transmitted via the speech channel of cellular phone systems. After a short introduction into text telephony, this contribution provides a description of this solution for PCS-1900 systems using CTM signals. The solution is indeed independent of the cellular system and works on de-facto all speech channels.