Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session AE-2 |
|
AE-2.1
|
On the utilization of overshoot effects in low-delay audio coding
Aki Härmä,
Unto K Laine,
Matti Karjalainen (Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing)
In low-delay audio coding (coding delay < 5 ms) there is no time
for detailed spectral modeling in the case of brief percussive sounds, e.g.,
the castanets, and onsets of music or speech sounds. On the other hand,
it is known from psychoacoustic experiments that the ear is not
accurate near the onset of a wideband sound. In this paper, we study the
audibility of coding errors near the onsets of musical sounds in a
simulated low-delay audio codec based on frequency-warped linear prediction.
It is suggested that for many musical transients it is sufficient to
reproduce a rough temporal and spectral envelope of the original
signal during the first 5-10 ms. Preliminary listening tests support
this idea. It is proposed that the overshoot effect of hearing
could be utilized efficiently in enhancing the performance of a
low-delay audio coding scheme.
|
AE-2.2
|
SCALABLE AUDIO CODER BASED ON QUANTIZER UNITS OF MDCT COEFFICIENTS
Akio Jin,
Takehiro Moriya (NTT Human Interface Laboratories),
Takeshi Norimatsu,
Mineo Tsushima,
Tomokazu Ishikawa (Matsushita Electric Industrial Co., Ltd.)
A scalable codec has been constructed by using transform coding and the basic modules for scalable encoder and decoder.
It allows users to choose a variety of scalable configrations in the frequency domain.
The basic module is a quantizer that can quantize MDCT (Modified DCT) coefficients transformed from a variety of frequency regions.
This module mainly works at bitrates of more than 8 kbit/s.
We can also change the target frequency regions of the basic module's input-output signals in each transform frame;
i.e., we can change the scalable structure according to the nature of input signals.
In the scalable codec described here, the input-output signals are monaural and the sampling frequency is 24 kHz.
The total bit rate of this scalable codec is more than 8 kbit/s.
Subjective quality evaluation tests, mainly for musical sound sources, showed that its sound quality is better than that of an MPEG2-layer3 codec at 8, 16, and 24 kbit/s when our scalable codec is construced of 8-kbit/s basic modules.
In combination with AAC (Advanced Audio Coding), our scalable codec will be chosen as an international standard in ISO/IEC-MPEG-4/Audio.
|
AE-2.3
|
An Algorithm For Compression of Diverse Speech and Audio Signals
Trevor R Trinkaus,
Mark A Clements (Georgia Institute of Technology)
A compression scheme for diverse speech and audio signals is proposed. In this
scheme, signals are analyzed with a 2-band QMF filterbank followed by the
application of a Modulated Lapped Biorthogonal Transform (MLBT) to each of the
filter bank channels. Subsequent encoding of transform coefficients is
performed using Laplacian optimized scalar and vector quantizers, whose rates
are determined by an estimated noise threshold, i.e., masking threshold.
Listening tests show that the coder achieves a quality at 32 Kbits/s that is
preferred over the ITU G.722 coder at 64 Kbits/s, for speech, music, and more
diverse signals consisting of speech in the presence of eventful background
sounds. Both the delay of the coder, at 40 ms, and the level of complexity are
moderate.
|
AE-2.4
|
A New Forward Masking Model and Its Application to Perceptual Audio Coding
Yuan-Hao Huang (Room 333, Department of Electrical Engineering, Nation Taiwan University, Taipei, Taiwan R.O.C),
Tzi-Dar Chiueh (Room 511, Department of Electrical Engineering, Nation Taiwan University, Taipei, Taiwan R.O.C)
This paper presents a new forward masking model for
perceptual audio coding. This model exploits adaptation
of the peripheral sensory and neural elements in the
auditory system, which is often deemed as the cause of
forward masking. Nonlinearity of the ear is modeled by
a nonlinear analog circuit with difference equations.
We incorporate this model in the MPEG Layer III audio
coding scheme and construct a masking plane in the
frequency-time space. With some extra computations, the
new audio coding scheme can improve the sound quality
of the decoded audio signals. In our experiments,
subjective and objective sound quality measurements
show that, to achieve the same reconstructed sound
quality, the new scheme requires 12% to 23% less bits
than the original MPEG Layer III scheme.
|
AE-2.5
|
Best Wavelet-Packet Bases for Audio Coding Using Perceptual and Rate-Distortion Criteria
Markus Erne,
George Moschytz (Swiss Federal Institute of Technology, Signal- and Information Processing Laboratory),
Christof Faller (Swiss Federal Institute of Technology)
This paper presents a new approach to the adaptation of a wavelet filterbank based on perceptual and rate-distortion criteria. The system makes use of a wavelet-packet transform where each subband can have an individual time-segmentation. Boundary effects can be avoided by using overlapping blocks of samples and therefore switching bases is possible at every tree-level without affecting other subbands. A modified psychoacoustic model using perceptual entropy can control the switching of the wavelet filterbank and the individual time-segmentation of every subband allows to take advantage of temporal masking. Additionally a rate-distortion measure can control the filterbank for lossless audio coding applications or in cases where large coding gains can be achieved without using perceptual criteria. The weight of the perceptual measure as well as the weight of the rate-distortion measure can be selected individually, enabling to trade lossless-coding versus perceptual coding.
|
AE-2.6
|
Improving Perceptual Coding of Narrowband Audio Signals at Low Rates
Hossein Najafzadeh-Azghandi,
Peter Kabal (McGill University)
This paper discusses perceptual coding of narrowband audio signals at low rates. In particular, it proposes a new error measure which shapes the noise inside the critical bands, a window switching criterion based
on the temporal masking effect of the hearing system, a more accurate model of the simultaneous masking effect of the hearing system, perceptually-based bit allocation algorithms based on two different approaches towards quantization noise shaping and a predictive vector quantization scheme to code the scale factors. The resulting coding scheme outperforms existing low rate speech coders for non-speech signals
|
AE-2.7
|
Subband-Domain Filtering of MPEG Audio Signals
Chris A Lanciani,
Ronald W Schafer (Georgia Institute of Technology)
The cosine modulated filter bank is commonly used for the
time-frequency decomposition of audio signals. For example, it is a
basic element of the MPEG-1 and MPEG-2 audio coding standards. While
this filter bank is not perfectly-reconstructing, it does provide for
the cancelation of aliasing components that are introduced during the
analysis decomposition. If the subband signals are to be processed,
care must be taken to preserve the properties of the subband signals
such that the aliased terms will be canceled successfully in the
synthesis filter bank despite the modification of the subband signals.
In this paper, a framework is provided for the generation and
application of arbitrary FIR filters to signals that have been
decomposed using the MPEG filter bank.
|
|