AE-2.1

On the utilization of overshoot effects in low-delay audio coding
Aki Härmä, Unto K Laine, Matti Karjalainen (Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing)

In low-delay audio coding (coding delay < 5 ms) there is no time for detailed spectral modeling in the case of brief percussive sounds, e.g., the castanets, and onsets of music or speech sounds. On the other hand, it is known from psychoacoustic experiments that the ear is not accurate near the onset of a wideband sound. In this paper, we study the audibility of coding errors near the onsets of musical sounds in a simulated low-delay audio codec based on frequency-warped linear prediction. It is suggested that for many musical transients it is sufficient to reproduce a rough temporal and spectral envelope of the original signal during the first 5-10 ms. Preliminary listening tests support this idea. It is proposed that the overshoot effect of hearing could be utilized efficiently in enhancing the performance of a low-delay audio coding scheme.

AE-2.2

SCALABLE AUDIO CODER BASED ON QUANTIZER UNITS OF MDCT COEFFICIENTS
Akio Jin, Takehiro Moriya (NTT Human Interface Laboratories), Takeshi Norimatsu, Mineo Tsushima, Tomokazu Ishikawa (Matsushita Electric Industrial Co., Ltd.)

A scalable codec has been constructed by using transform coding and the basic modules for scalable encoder and decoder. It allows users to choose a variety of scalable configrations in the frequency domain. The basic module is a quantizer that can quantize MDCT (Modified DCT) coefficients transformed from a variety of frequency regions. This module mainly works at bitrates of more than 8 kbit/s. We can also change the target frequency regions of the basic module's input-output signals in each transform frame; i.e., we can change the scalable structure according to the nature of input signals. In the scalable codec described here, the input-output signals are monaural and the sampling frequency is 24 kHz. The total bit rate of this scalable codec is more than 8 kbit/s. Subjective quality evaluation tests, mainly for musical sound sources, showed that its sound quality is better than that of an MPEG2-layer3 codec at 8, 16, and 24 kbit/s when our scalable codec is construced of 8-kbit/s basic modules. In combination with AAC (Advanced Audio Coding), our scalable codec will be chosen as an international standard in ISO/IEC-MPEG-4/Audio.

AE-2.3

An Algorithm For Compression of Diverse Speech and Audio Signals
Trevor R Trinkaus, Mark A Clements (Georgia Institute of Technology)

A compression scheme for diverse speech and audio signals is proposed. In this scheme, signals are analyzed with a 2-band QMF filterbank followed by the application of a Modulated Lapped Biorthogonal Transform (MLBT) to each of the filter bank channels. Subsequent encoding of transform coefficients is performed using Laplacian optimized scalar and vector quantizers, whose rates are determined by an estimated noise threshold, i.e., masking threshold. Listening tests show that the coder achieves a quality at 32 Kbits/s that is preferred over the ITU G.722 coder at 64 Kbits/s, for speech, music, and more diverse signals consisting of speech in the presence of eventful background sounds. Both the delay of the coder, at 40 ms, and the level of complexity are moderate.

AE-2.4

A New Forward Masking Model and Its Application to Perceptual Audio Coding
Yuan-Hao Huang (Room 333, Department of Electrical Engineering, Nation Taiwan University, Taipei, Taiwan R.O.C), Tzi-Dar Chiueh (Room 511, Department of Electrical Engineering, Nation Taiwan University, Taipei, Taiwan R.O.C)

This paper presents a new forward masking model for perceptual audio coding. This model exploits adaptation of the peripheral sensory and neural elements in the auditory system, which is often deemed as the cause of forward masking. Nonlinearity of the ear is modeled by a nonlinear analog circuit with difference equations. We incorporate this model in the MPEG Layer III audio coding scheme and construct a masking plane in the frequency-time space. With some extra computations, the new audio coding scheme can improve the sound quality of the decoded audio signals. In our experiments, subjective and objective sound quality measurements show that, to achieve the same reconstructed sound quality, the new scheme requires 12% to 23% less bits than the original MPEG Layer III scheme.

AE-2.5

Best Wavelet-Packet Bases for Audio Coding Using Perceptual and Rate-Distortion Criteria
Markus Erne, George Moschytz (Swiss Federal Institute of Technology, Signal- and Information Processing Laboratory), Christof Faller (Swiss Federal Institute of Technology)

This paper presents a new approach to the adaptation of a wavelet filterbank based on perceptual and rate-distortion criteria. The system makes use of a wavelet-packet transform where each subband can have an individual time-segmentation. Boundary effects can be avoided by using overlapping blocks of samples and therefore switching bases is possible at every tree-level without affecting other subbands. A modified psychoacoustic model using perceptual entropy can control the switching of the wavelet filterbank and the individual time-segmentation of every subband allows to take advantage of temporal masking. Additionally a rate-distortion measure can control the filterbank for lossless audio coding applications or in cases where large coding gains can be achieved without using perceptual criteria. The weight of the perceptual measure as well as the weight of the rate-distortion measure can be selected individually, enabling to trade lossless-coding versus perceptual coding.

AE-2.6

Improving Perceptual Coding of Narrowband Audio Signals at Low Rates
Hossein Najafzadeh-Azghandi, Peter Kabal (McGill University)

This paper discusses perceptual coding of narrowband audio signals at low rates. In particular, it proposes a new error measure which shapes the noise inside the critical bands, a window switching criterion based on the temporal masking effect of the hearing system, a more accurate model of the simultaneous masking effect of the hearing system, perceptually-based bit allocation algorithms based on two different approaches towards quantization noise shaping and a predictive vector quantization scheme to code the scale factors. The resulting coding scheme outperforms existing low rate speech coders for non-speech signals

AE-2.7

Subband-Domain Filtering of MPEG Audio Signals
Chris A Lanciani, Ronald W Schafer (Georgia Institute of Technology)

The cosine modulated filter bank is commonly used for the time-frequency decomposition of audio signals. For example, it is a basic element of the MPEG-1 and MPEG-2 audio coding standards. While this filter bank is not perfectly-reconstructing, it does provide for the cancelation of aliasing components that are introduced during the analysis decomposition. If the subband signals are to be processed, care must be taken to preserve the properties of the subband signals such that the aliased terms will be canceled successfully in the synthesis filter bank despite the modification of the subband signals. In this paper, a framework is provided for the generation and application of arbitrary FIR filters to signals that have been decomposed using the MPEG filter bank.

< AE-1 AE-3 >

Last Update: February 4, 1999 Ingo Höntsch