Authors:
Jason D Kridner,
Mark T Nadeski,
Pedro R Gelabert,
Page (NA) Paper number 2137
Abstract:
New audio compression algorithms and non-volatile flash memory technology
have enabled the creation of portable solid-state personal audio players.
This paper presents a low-power portable audio system based on the
Texas Instruments TMS320C5000 DSP family. This system is designed to
play music and other audio media stored on flash memory cards that
can hold over an hour of CD quality music. The flash card provides
higher audio quality than a cassette tape, yet is smaller and more
durable than a CD. The music or audio material downloaded to the flash
is obtained from licensed distributors, either through the internet
or through kiosks setup in retail outlets. All the audio decoding and
watermarking required by this system is handled by the TMS320C5000
DSP. System performance characteristics are also presented.
Authors:
Hyen-O Oh,
Sung-Youn Kim,
Dae-Hee Youn,
Il-Whan Cha,
Page (NA) Paper number 1594
Abstract:
In this study, new implementation techniques of a real-time MPEG-2
audio encoding system are presented. The system is developed using
general-purpose DSP's. It consists of one master unit and five slave
units, and its structure is basically based on our early work. Two
fast algorithms are developed and applied to the most compute-intensive
routines of the encoding process. These algorithms play a key role
to improve the entire system performance. The implemented system is
designed to encode audio signal into MPEG-2 layer II bitstream with
full configurations up to 5.1 channels and 640Kbps, and intended to
support state-of-the-art quality. Generated bitstream can be stored
in hard disk on PC or sent to integration system to be multiplexed
with corresponding video-bitstream.
Authors:
Sassan Ahmadi,
Page (NA) Paper number 2193
Abstract:
An improved harmonic sinusoidal model is presented, where the underlying
sine wave amplitudes and phases are efficiently represented using a
combination of linear prediction, linear phase alignment, all-pass
filtering, and spectral sampling in the residual-domain. The analysis
and synthesis systems are introduced and the derivation and encoding
of each model parameter is discussed. Performance analysis on a large
database indicates effective modeling of the sinusoidal parameters.
A variable-rate sinusoidal coder operating at an average bit rate of
1.75 kbps, based on the proposed model, has been developed, yielding
reproduced speech of good quality, intelligibility, and naturalness.
The proposed model may find applications in low bit rate speech coding
in high capacity wireless communication systems.
Authors:
Ali E Ertan,
Emre B Aksu,
Hakki G Ilk,
Haydar Karci,
Önder Karpat,
Taner Kolçak,
Levent S;endur,
Mubeccel Demirekler,
Ahmet Enis Çetin,
Page (NA) Paper number 1758
Abstract:
In this paper, a fixed point Variable Bit-Rate (VBR) Mixed Excitation
Linear Predictive Coding (MELP) vocoder is presented. The VBR-MELP
vocoder is also implemented on a TMS320C54x and it achieves virtually
indistinguishable federal standard MELP quality at bit-rates between
1.0 to 1.6 kb/s. The backbone of VBR-MELP vocoder is similar to that
of federal standard MELP. It utilizes a novel sub-band based voice
activity detector in the back-end of encoder to discriminate background
noise from speech activity. Since proposed detector uses only parameters
extracted in the encoder, its computational complexity is very low.
Authors:
Fenghua Liu,
Ryan Heidari,
Page (NA) Paper number 3020
Abstract:
This paper presents an algebraic vector quantized codebook excited
linear prediction (AVQ-CELP) speech codec. The objective is to enhance
the half rate mode of IS-127, the enhanced variable rate codec (EVRC).
In AVQ-CELP scheme, only the perceptually important components are
encoded, and the selection of the components is done in a way similar
to the ACELP. An open-loop procedure is used to select the sub-vectors.
The selected sub-vectors are concatenated and vector quantized. An
analysis-by-synthesis strategy is used to determine the optimal excitation.
The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook.
In order to improve the synthesis quality of voiced frames, a two-pulse
version of ACELP is used in the strong voiced frames. The proposed
algorithm was incorporated in the Nokia CDMA handset prototype. Under
a joint collaboration effort with SK Telecom, a field-testing was performed
in Korea to evaluate the performance of the proposed AVQ algorithm.
The results indicate a considerable improvement relative to the standard
EVRC operating at the maximum half-rate.
Authors:
Hong Kook Kim, AT&T Labs Research, Rm. E148, 180 Park Avenue, Florham Park NJ 07932, USA (USA)
Mi Suk Lee, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea (Korea)
Hwang Soo Lee, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea (Korea)
Page (NA) Paper number 1329
Abstract:
In this paper, we propose an adaptive fixed code-excited linear prediction
(AF-CELP) speech coder operating at 4 kbps. By exploiting the fact
that a fixed codebook contribution to speech signal is also periodic
as the corresponding adaptive codebook contribution, the adaptive fixed
codebook model efficiently represents excitation signals. In order
to overcome the quality degradation caused by the coarse quantization
of excitation, a paired pulse algebraic codebook structure is also
applied to the excitation model. Additionally, a pitch prefiltering,
a noise spreading, and a harmonic enhancement technique are adopted
in the decoding process. The spectrogram reading and informal listening
tests proved that the AF-CELP reproduces high quality speech.
Authors:
Amitava Das,
Andy DeJaco,
Sharath Manjunath,
Ananth Ananthapadmanabhan,
Jeff Huang,
Eddie Choy,
Page (NA) Paper number 3014
Abstract:
The speech signal consists of a time-varying ensemble of different
types of segments with distinct characteristics, which require different
degrees of coding resolution in order to retain an overall high voice
quality. A fixed-rate coder can capture such time-varying characteristics
only if it operates at a high enough bit rate. At low bit rate, a fixed-rate
coder will not be able to capture all of these various segments well
and will fail to render high voice quality. A multimode variable bit
rate (VBR) coder uses an arsenal of modes, operating at different bit
rates. These modes are designed to represent these different speech
segments optimally with the ight amount of coding resolution. Thus,
a multimode VBR codec adapts the coding mechanism to the input speech
and delivers high quality at low (average) rates. This paper presents
the essential framework and the unique advantages of a multimode VBR
codec and suggests algorithms for the different modes.
Authors:
Costas C.S. Xydeas, University of Manchester, UK (U.K.)
Thomas M Chapman, University of Manchester, UK (U.K.)
Page (NA) Paper number 2340
Abstract:
Current parametric speech coding schemes can achieve high communications
quality speech at bit rates in the range of 2.4 to 1.5kbits/sec. Most
schemes sample and quantise, at regular intervals, the "tracks in time"
generated by the parameters of the speech production model. As a result,
reconstructed "parameter tracks" do not evolve "smoothly" with time.
Furthermore, no advantage is taken of the "linguistic event" nature
of speech. In this paper, model parameter "time tracks" are split into
non overlapping speech "event" related segments. These segment based
evolutions of model parameters are then vector quantised to provide
at the receiver a smooth and subjectively meaningful reconstruction.
Thus the paper presents an application of this generic segmental speech
model quantisation approach to a 1.5kbits/sec Prototype Interpolation
Coding (PIC) system. Results indicate that the proposed methodology
can almost halve the bit rate of this PIC system while preserving overall
recovered speech quality.
|