Chair: Karlheinz Brandenburg, IIG
Anibal J. S. Ferreira, INESC (PORTUGAL)
Perceptual audio coders rely on the efficient reduction of perceptually irrelevant components of the audio signal as well as on the removal of statistical signal redundancies to achieve good coding gains. In order to reach high compression ratios without reducing the subjective quality of the encoded audio signal, it is necessary to identify critically interdependent functional units of the encoding algorithm and to jointly optimize their performance. A flexible and interactive simulation and analysis environment has been programmed to assist the development and optimization of a new perceptual coder. The main features of this environment will be explained and the most relevant aspects that were found to limit the encoding performance will be presented.
S. Boland, Queensland University of Technology (AUSTRALIA)
M. Deriche, Queensland University of Technology (AUSTRALIA)
Most current work in the area of high quality audio coding falls under one of two categories: transform or sub-band coding. LPC coders since based on modelling human voice production systems are found to be inappropriate in modelling music and other non-speech sounds. A more improved model for such signals is shown to be the Multipulse LPC model. In this paper we propose to improve the quality of the Multipulse model by first passing the signal of interest through a filter bank and then extracting the Multipulse parameters from each of the bandpass filter outputs. The idea of the wavelet decomposition is utilised for the design of the filter bank. Both the Multipulse model and the wavelet decomposition are well known. But a combination of both has not been exploited yet. This combination is expected to lead to a new way in high quality low bitrate audio coding.
J. Princen, AT&T Bell Laboratories (USA)
J.D. Johnston, AT&T Bell Laboratories (USA)
In this paper we present a high quality audio coding system based on a novel nonuniform modulated filterbank coupled with time-varying cosine modulated filterbanks in a cascade architecture. The system makes use of psychoacoustic thresholds in a natural way to adapt the resolution of the filterbank to achieve high coding gain on a wide range of signal types. Results show that the system provides excellent quality at 64 kb/s and good quality at 48 kb/s for monophonic coding.
Mark Black, The University of Western Ontario
Mehmet Zeytinoglu, Ryerson Polytechnic University (CANADA)
This paper presents a new audio compressor based on the wavelet packet (WP) decomposition. The major drawback of the present wideband multichannel audio compressors is the large computational effort associated with the subband decomposition and the psychoacoustic model. We integrate the psychoacoustic model with the design of the decomposition filterbank which separates the wideband signal into 28 subbands closely approximating the critical bands. The psychoacoustic model exploits noise masking and joint stereo coding to compress the subband signals. We demonstrate that the WP decomposition provides sufficient resolution to extract the time-frequency characteristics of the wideband input signal. The WP based audio compressor provides transparent sound quality at compression rates comparable to the MPEG compressor with less than one third of the computational effort.
Shiufun Cheung, Massachusetts Institute of Technology (USA)
Jae S. Lim, Massachusetts Institute of Technology (USA)
Acoustic signal representations used in current audio coding algorithms can be improved by the incorporation of biorthogonality into Malvar's Extended Lapped Transform (ELT). Biorthogonality allows more flexibility in the design of the analysis and synthesis windows by increasing the number of degrees of freedom. This paper examines this increase for two special cases and demonstrates the importance of the additional flexibility to the proper implementation of psychoacoustic modeling, a feature central to all modern audio compression schemes.
J- M. LeRoux, Matra Communication
R. Lefebvre, University of Sherbrooke (CANADA)
J-P. Adoul, University of Sherbrooke (CANADA)
This paper reports on the specific contribution of the Wavelet Transform (WT) in the TCX coding model for audio signals. TCX, or Transform Coded eXcitation is a frame based coding algorithm that uses both time domain (linear prediction) and frequency domain (transform coding) approaches to exploit signal redundancies as well as frequency masking. While previous work on TCX used the Discrete Fourier Transform (DCT), the quality for highly non-stationary signals such as percussions was less than satisfactory. The WT has therefore been investigated as a compromise between time and frequency resolution.
P.E. Kudumakis, King's College London (UK)
M.B. Sandler, King's College London (UK)
The performance of some different wavelet families, including for comparison a well known family of QMFs, is investigated for low bit rate coding of audio signals. For the assessment of the coding gain of these wavelets, both octave and uniform subband coding schemes have been evaluated, using both constant and dynamic bit allocation, with and without entropy noiseless Huffman coding. The influence of complexity of these wavelets, in terms of number of filter coefficients, against the quality of the decompressed audio signals in terms of Segmental-SNR (dB), is presented, at different bit rates. In addition, this evaluation suggests that perceptually transparent quality of monophonic signals can be achieved at 24 kbits/sec (Fs= 8kHz, 3 bits/sample) for speech applications and at 64 kbits/sec (Fs= 48kHz, 1.33 bits/sample) for music related applications, as in digital audio transmission and storage.
Andrew L. Adams, Harris RF Communications
Steven W. McLaughlin, Rochester Institute of Technology (USA)
We consider the lossless compression of high fidelity (e.g. 16-bit) digital audio using adaptive linear prediction. Both linear predictive coding (LPC) and least mean squares (LMS) predictors are considered. Preliminary results are presented for the compression of industry standard Sound Quality Assessment Material (SQAM) [1] samples from 16 bits to 1.5 - 3 bits. Previous results by others on the same audio source was in the 8-bit range.
Naoki Iwakami, NTT Human Interface Labs. (JAPAN)
Takehiro Moriya, NTT Human Interface Labs. (JAPAN)
Satoshi Miki, NTT Human Interface Labs. (JAPAN)
A new audio-coding method is proposed. This method is called transform-domain weighted interleave vector quantization (TwinVQ) and achieves high-quality reproduction at less than 64 kbit/s. The method is a transform coding using modified discrete cosine transform (MDCT). There are three novel techniques in this method: flattening of the MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; interframe backward prediction for flattening the MDCT coefficients; evaluation tests showed that the quality of the reproduction of TwinVQ exceeded that of an MPEG Layer II coder at the same bitrate.
J. Benesty, Telecom Paris
F. Amand, CNET LAA/TSS/CMC
A. Gilloire, CNET LAA/TSS/CMC
Y. Grenier, Telecom Paris (FRANCE)
It is likely that stereophonic (and more generally, multi-channel) sound pick-up, transmission and diffusion will be implemented in future teleconference systems to provide the users with enhanced quality. Therefore, adequate solutions must be found to solve the problem of stereophonic acoustic echo which will occur in such systems. We explain in this paper the difference between the mono and two-channel systems and the behavior of the two-channel classical adaptive algorithms in comparison with the same algorithms in the mono-channel case. Also, we outline a new NLMS-like algorithm derived from the two-channel RLS algorithm as a first member of a family of improved two-channel adaptive filters.