1:00, SPEECH-L7.1
A CANDIDATE PROPOSAL FOR A 3GPP ADAPTIVE MULTI-RATE WIDEBAND SPEECH CODEC
C. ERDMANN, P. VARY, K. FISCHER, W. XU, M. MARKE, T. FINGSCHEIDT, I. VARGA, M. KAINDL, C. QUINQUIS, B. KOVESI, D. MASSALOUX
This paper describes an adaptive multi-rate wideband (AMR-WB) speech codec
proposed for the GSM system and also for
the evolving Third Generation (3G) mobile
speech services. The speech codec is based on SB-CELP
(Subband-Code-Excited Linear Prediction)
with five modes operating bit rates from 24 kbit/s down to 9.1 kbit/s.
The respective channel coding schemes are based on RSC (Recursive Systematic
Code) and UEP (Unequal Error Protection). Both, source and channel codec are designed as homogenous
as possible to guarantee robust transmission on current and future mobile radio channels.
1:20, SPEECH-L7.2
AN EMBEDDED ADAPTIVE MULTI-RATE WIDEBAND SPEECH CODER
A. MCCREE, T. UNNO, A. ANANDAKUMAR, A. BERNARD, E. PAKSOY
This paper presents a multi-rate wideband speech coder with bit rates
from 8 to 32 kb/s. The coder uses a splitband approach, where the
input signal, sampled at 16 kHz, is split into two equal frequency
bands from 0-4 kHz and 4-8 kHz, each of which is decimated to an 8 kHz
sampling rate. The lower band is coded using the Adaptive Multi-rate
(AMR) family of high-quality narrowband speech coders, while the
higher band is represented by a simple but effective parametric model.
A complete solution including this wideband speech coder, channel
coding for various GSM channels, and dynamic rate adaptation, easily
passed all Selection Rules and ranked second overall in the recent
3GPP AMR Wideband Selection Testing. Besides high performance,
additional advantages of the embedded split-band approach include ease
of implementation, reduced complexity, and simplified interoperation
with narrowband speech coders.
1:40, SPEECH-L7.3
OPTIMAL ESTIMATION OF SUBBAND SPEECH FROM NONUNIFORM NON-RECURRENT SIGNAL-DRIVEN SPARSE SAMPLES
P. PENEV, L. IORDANOV
Speech signals are comprised of auditory objects that are
localized in time, but can appear anywhere in the record. We
introduce a strategy for non-recurrent irregular signal-driven
sampling and subsequent maximum likelihood interpolation of
speech subbands that achieves object constancy---the
representation of an auditory object is precisely locked to
the timing of its features, but is otherwise constant.
Moreover, the reconstruction fidelity can be traded flexibly
for sampling rate, over a broad range of signal-to-noise
ratios and application requirements. In an experiment with
wide-band speech, we find a regime in the rate/distortion
curve that has almost perfect reconstruction at a rate
substantially lower than the respective Nyquist rate.
2:00, SPEECH-L7.4
VARIABLE-SIZE VECTOR ENTROPY CODING OF SPEECH AND AUDIO
Y. SHOHAM
Many modern analog media coders employ some form of entropy
coding (EC). Usually, a simple per-letter EC is used to keep
the coder's complexity and price low. In some coders, individual
symbols are grouped into small fixed-size vectors before EC is
applied. In this work we extend this approach to form Variable-
Size Vector EC (VSVEC) in which vector sizes may be from 1 to
several hundreds. The method is, however, complexity-constrained
in the sense that the vector size is always as large as allowed
by a pre-set complexity limit. The idea is studied in the
framework of an MDCT transform coder. It is shown experimentally,
using diverse audio material, that a rate reduction of about 37%
can be achieved. The method is, however, not specific to MDCT
coding but can be incorporated in various speech, audio, image
and video coders.
2:20, SPEECH-L7.5
WIDEBAND SPEECH AND AUDIO CODING USING GAMMATONE FILTER BANKS
E. AMBIKAIRAJAH, J. EPPS, L. LIN
Considerable research attention has been directed towards speech and audio coding algorithms capable of producing high quality coded speech and audio, however few of these use signal representations which account for temporal as well as spectral detail. This paper presents a new technique for 16 kHz wideband speech and audio coding, whereby analysis and synthesis are performed using a linear phase gammatone filter bank. The outputs of these critical band filters are processed to obtain a series of pulse trains that represent neural firing. Auditory masking is then applied to reduce the number of pulses, producing a more compact time-frequency parameterization. The critical band gains and pulse amplitudes and positions are then coded using a combination of non-uniform quantization, arithmetic coding and vector quantization. This coding paradigm produces high quality coded speech and audio, is based upon well-known models of the auditory system, is highly scalable, and has moderate complexity.
2:40, SPEECH-L7.6
FREQUENCY SELECTIVITY VIA THE SPENT METHODOLOGY FOR WIDEBAND SPEECH COMPRESSION
M. KOKES, J. GIBSON
In speech and audio coding, frequency selectivity of the basis
functions is an important property of the codec. The more precise
the frequency selectivity, the less chance there is for audible
coding effects due to uncanceled aliasing. In this work, we use
Campbell's coefficient rate and the spectral entropy (SpEnt) of the
source random process as a guide to formulate adaptive nonuniform
modulated lapped biorthogonal transforms (NMLBT). The use of the
NMLBT allows for efficient implementation of a time-varying transform
which possesses both good frequency and time resolution at all
instances. By coupling the SpEnt methodology with the MLBT, we
develop band combining strategies to produce an adaptive NMLBT. This
new frequency selection process comprises a non-linear approximation
method to determine the best N basis functions for a speech frame.
We implement a wideband speech compression scheme based on our
strategy and verify its improved performance at 16 and 24 kbps.