Authors:
Aki Härmä,
Unto K. Laine,
Matti Karjalainen,
Page (NA) Paper number 1254
Abstract:
In low-delay audio coding (coding delay < 5 ms) there is no time for
detailed spectral modeling in the case of brief percussive sounds,
e.g., the castanets, and onsets of music or speech sounds. On the other
hand, it is known from psychoacoustic experiments that the ear is not
accurate near the onset of a wideband sound. In this paper, we study
the audibility of coding errors near the onsets of musical sounds in
a simulated low-delay audio codec based on frequency-warped linear
prediction. It is suggested that for many musical transients it is
sufficient to reproduce a rough temporal and spectral envelope of the
original signal during the first 5-10 ms. Preliminary listening tests
support this idea. It is proposed that the overshoot effect of hearing
could be utilized efficiently in enhancing the performance of a low-delay
audio coding scheme.
Authors:
Akio Jin,
Takehiro Moriya,
Takeshi Norimatsu,
Mineo Tsushima,
Tomokazu Ishikawa,
Page (NA) Paper number 1605
Abstract:
A scalable codec has been constructed by using transform coding and
the basic modules for scalable encoder and decoder. It allows users
to choose a variety of scalable configrations in the frequency domain.
The basic module is a quantizer that can quantize MDCT (Modified DCT)
coefficients transformed from a variety of frequency regions. This
module mainly works at bitrates of more than 8 kbit/s. We can also
change the target frequency regions of the basic module's input-output
signals in each transform frame; i.e., we can change the scalable structure
according to the nature of input signals. In the scalable codec described
here, the input-output signals are monaural and the sampling frequency
is 24 kHz. The total bit rate of this scalable codec is more than 8
kbit/s. Subjective quality evaluation tests, mainly for musical sound
sources, showed that its sound quality is better than that of an MPEG2-layer3
codec at 8, 16, and 24 kbit/s when our scalable codec is construced
of 8-kbit/s basic modules. In combination with AAC (Advanced Audio
Coding), our scalable codec will be chosen as an international standard
in ISO/IEC-MPEG-4/Audio.
Authors:
Trevor R Trinkaus,
Mark A Clements,
Page (NA) Paper number 2026
Abstract:
A compression scheme for diverse speech and audio signals is proposed.
In this scheme, signals are analyzed with a 2-band QMF filterbank followed
by the application of a Modulated Lapped Biorthogonal Transform (MLBT)
to each of the filter bank channels. Subsequent encoding of transform
coefficients is performed using Laplacian optimized scalar and vector
quantizers, whose rates are determined by an estimated noise threshold,
i.e., masking threshold. Listening tests show that the coder achieves
a quality at 32 Kbits/s that is preferred over the ITU G.722 coder
at 64 Kbits/s, for speech, music, and more diverse signals consisting
of speech in the presence of eventful background sounds. Both the delay
of the coder, at 40 ms, and the level of complexity are moderate.
Authors:
Yuan-Hao Huang, Room 333, Department of Electrical Engineering, Nation Taiwan University, Taipei, Taiwan R.O.C (Taiwan)
Tzi-Dar Chiueh, Room 511, Department of Electrical Engineering, Nation Taiwan University, Taipei, Taiwan R.O.C (Taiwan)
Page (NA) Paper number 1363
Abstract:
This paper presents a new forward masking model for perceptual audio
coding. This model exploits adaptation of the peripheral sensory and
neural elements in the auditory system, which is often deemed as the
cause of forward masking. Nonlinearity of the ear is modeled by a nonlinear
analog circuit with difference equations. We incorporate this model
in the MPEG Layer III audio coding scheme and construct a masking plane
in the frequency-time space. With some extra computations, the new
audio coding scheme can improve the sound quality of the decoded audio
signals. In our experiments, subjective and objective sound quality
measurements show that, to achieve the same reconstructed sound quality,
the new scheme requires 12% to 23% less bits than the original MPEG
Layer III scheme.
Authors:
Markus Erne,
George Moschytz,
Christof Faller,
Page (NA) Paper number 1442
Abstract:
This paper presents a new approach to the adaptation of a wavelet filterbank
based on perceptual and rate-distortion criteria. The system makes
use of a wavelet-packet transform where each subband can have an individual
time-segmentation. Boundary effects can be avoided by using overlapping
blocks of samples and therefore switching bases is possible at every
tree-level without affecting other subbands. A modified psychoacoustic
model using perceptual entropy can control the switching of the wavelet
filterbank and the individual time-segmentation of every subband allows
to take advantage of temporal masking. Additionally a rate-distortion
measure can control the filterbank for lossless audio coding applications
or in cases where large coding gains can be achieved without using
perceptual criteria. The weight of the perceptual measure as well as
the weight of the rate-distortion measure can be selected individually,
enabling to trade lossless-coding versus perceptual coding.
Authors:
Hossein Najafzadeh-Azghandi,
Peter Kabal,
Page (NA) Paper number 1779
Abstract:
This paper discusses perceptual coding of narrowband audio signals
at low rates. In particular, it proposes a new error measure which
shapes the noise inside the critical bands, a window switching criterion
based on the temporal masking effect of the hearing system, a more
accurate model of the simultaneous masking effect of the hearing system,
perceptually-based bit allocation algorithms based on two different
approaches towards quantization noise shaping and a predictive vector
quantization scheme to code the scale factors. The resulting coding
scheme outperforms existing low rate speech coders for non-speech signals
Authors:
Chris A Lanciani,
Ronald W Schafer,
Page (NA) Paper number 1994
Abstract:
The cosine modulated filter bank is commonly used for the time-frequency
decomposition of audio signals. For example, it is a basic element
of the MPEG-1 and MPEG-2 audio coding standards. While this filter
bank is not perfectly-reconstructing, it does provide for the cancelation
of aliasing components that are introduced during the analysis decomposition.
If the subband signals are to be processed, care must be taken to preserve
the properties of the subband signals such that the aliased terms will
be canceled successfully in the synthesis filter bank despite the modification
of the subband signals. In this paper, a framework is provided for
the generation and application of arbitrary FIR filters to signals
that have been decomposed using the MPEG filter bank.
|