9:30, AUDIO-P1.1
FAST ENCODING ALGORITHMS FOR MPEG-4 TWINVQ AUDIO TOOL
N. IWAKAMI, T. MORIYA, A. JIN, T. MORI, K. CHIKIRA
The ISO/IEC MPEG-4 Audio standard includes the TwinVQ
encoding tool. This tool is suitable for low-bit-rate
general audio coding, but drawback is the computational
complexity of the encoder.
To develop a faster TwinVQ encoder, new fast vector
quantization algorithms --- area localized
pre-selection and hit zone masking --- are introduced.
These algorithms exploit pre-
and main-selection procedure scheme for the conjugate structure
vector quantization which is used in the TwinVQ.
The improvement is evaluated by measuring the encoding speed
and the sound quality of reproduction.
9:30, AUDIO-P1.2
SCALABLE AUDIO CODING USING THE NONUNIFORM MODULATED COMPLEX LAPPED TRANSFORM
Z. XIONG, A. SCHEUBLE
This paper introduces a scalable audio coder using the nonuniform modulated complex lapped transform (NMCLT)[1], which is a new nonuniform oversampled filter bank with a better combination of
time- and frequency-domain localization than previous designs.
Masking functions for different critical Bark bands are first
calculated directly from the NMCLT coefficients as perceptual weights and arithmetic coding is then used to compress bit planes of the weighted NMCLT coefficients to generate a perceptually scalable audio bitstream. The loss in coding performance due to oversampling
is offset by limiting the amount of redundancy in the transform
and exploiting the correlations among the NMCLT basis functions.
Experiments show that our new coder outperforms a coder with the modulated lapped transform (MLT)[2] both objectively and subjectively.
9:30, AUDIO-P1.3
FASTMPEG: TIME-SCALE MODIFICATION OF BIT-COMPRESSED AUDIO INFORMATION
M. COVELL, M. SLANEY, A. ROTHSTEIN
This paper describes techniques to change the playback speed of MPEG-compressed audio, without first decompressing the audio file. There are two primary contributions in this paper. 1) We describe three techniques to perform time-scale modification in the maximally decimated domain. 2) We show how to infer the output of the auditory masking model on the new audio stream, using the information in the original file. This new FastMPEG algorithm is more than an order of magnitude more efficient than decompressing the audio, performing time-scale modification in the conventional time-domain, and then recompressing. Samples of our results can be found at http://www.slaney.org/covell/FastMPEG/.
9:30, AUDIO-P1.4
A COMPUTATIONALLY EFFICIENT COCHLEAR FILTER BANK FOR PERCEPTUAL AUDIO CODING
F. BAUMGARTE
Many applications in auditory modeling require analysis filters that approximate the frequency selectivity given by psychophysical data, e.g. from masking experiments using narrow-band maskers. This frequency selectivity is largely determined by the
spectral decomposition process inside the human cochlea. Currently
used spectral decomposition schemes for masking modeling in audio coding generally do not achieve the non-uniform
time and frequency resolution provided by the cochlea. These applications rather take advantage of the computational efficiency of uniform filter banks or transforms at the expense of coding gain.
This paper presents a suitable analysis filter-bank structure employing cascaded low-order IIR filters and appropriate down-sampling to increase efficiency. In an application example, the filter responses were optimized to model auditory masking effects.
The results show that the time and frequency resolution of the filter bank matches or exceeds the masking properties. Thus, the filter bank enables improved masking modeling for audio coding at low computational costs.
9:30, AUDIO-P1.5
NEAR-OPTIMAL SELECTION OF ENCODING PARAMETERS FOR AUDIO CODING
A. AGGARWAL, S. REGUNATHAN, K. ROSE
We address the issue of optimizing side information rate for efficient audio coding. In coders such as the MPEG-4 AAC, at rates around 16kbps to 48kbps, the side information rate forms a substantial part of the total rate. The parameter search procedure in the Verification Model optimizes each band separately and results in poor performance at low rates. We propose to jointly optimize the encoding parameter of all the bands. The near-optimal solution using a brute force search has drastic computational complexity. However, the same solution is obtained at a much reduced complexity using a Viterbi search through a trellis. The search procedure is developed and evaluated for two objective measures, the average and the maximum noise-mask ratio.
For both the measures, the trellis-based search yields substantially better solutions. In particular, trellis-based optimization of maximum noise-mask ratio greatly improves the performance of AAC at low rates. The resulting bit stream is standard-compatible, and the additional complexity due to the proposed optimization is only incurred at the encoder.
9:30, AUDIO-P1.6
LOSSLESS CODING OF AUDIO SIGNALS USING CASCADED PREDICTION
G. SCHULLER, B. YU, D. HUANG
A novel predictive lossless coding scheme is proposed.
The prediction is based on a new weighted cascaded least
mean squared (WCLMS) method. WCLMS is especially designed
for music/speech signals. It can be used either in
combination with psycho-acoustically pre-filtered signals
(an idea presented in ICASSP 2000) to obtain *perceptually*
lossless coding, or as a stand-alone lossless coder.
Experiments on a database of moderate size and a variety
of pre-filtered mono-signals show that the proposed lossless coder
(which needs about 2 bit/sample for pre-filtered signals)
outperforms competing lossless coders, WaveZip, Shorten, LTAC and LPAC, in terms of compression ratios.
http://www.multimedia.bell-labs.com
9:30, AUDIO-P1.7
A SCALABLE AND PROGRESSIVE AUDIO CODEC
L. ATLAS, M. VINTON
A source coding technique for variable, bandwidth-constrained channels such as the Internet must do two things: offer high quality at low data rates, and adapt gracefully to changes in available bandwidth. Here we propose an audio coding algorithm that is superior on both counts. It is inherently scalable, meaning that channel conditions can be matched without the need for additional computation. Moreover, it is compact: in subjective tests our algorithm, coded at 32kb/s/channel, outperformed MPEG-1 Layer 3 (MP3) coded at 56kb/s/channel (both at 44.1kHz). We achieve this simultaneous increase in compression and scalability through use of a two-dimensional transform that concentrates relevant information into a small number of coefficients.
9:30, AUDIO-P1.8
SINUSOIDAL MODELING OF AUDIO AND SPEECH USING PSYCHOACOUSTIC-ADAPTIVE MATCHING PURSUITS
R. HEUSDENS, R. VAFIN, W. KLEIJN
In this paper, we propose a segment-based matching pursuit algorithm where the psychoacoustical properties of the human auditory system are taken into account. Rather than scaling the dictionary elements according to auditory perception, we define a psychoacoustic-adaptive
norm on the signal space which can be used for assigning the dictionary elements to the individual segments in a rate-distortion optimal manner. The new algorithm is asymptotically equal to signal-to-mask ratio based algorithms in the limit of infinite analysis window length. However, the new algorithm provides a significantly improved selection of the dictionary elements for finite window length.
9:30, AUDIO-P1.9
MODIFYING TRANSIENTS FOR EFFICIENT CODING OF AUDIO
R. VAFIN, R. HEUSDENS, W. KLEIJN
In this paper, we propose a method for efficient representation of transients in audio signals. We estimate the transient component of an original audio signal and modify the locations of the transients in such a way that the transients can occur only at locations defined by a relatively coarse time grid. This procedure allows an efficient representation of transients with damped sinusoids. We also verify that the introduced modifications do not result in a perceptual difference between the original and the modified audio signals.
9:30, AUDIO-P1.10
PERCEPTUAL SEGMENTATION AND COMPONENT SECTION IN SINUSOIDAL REPRESENTATIONS OF AUDIO
A. SPANIAS, T. PAINTER
This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids.