Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session ITT-6 |
|
ITT-6.1
|
A DSP Powered Solid State Audio System
Jason D Kridner,
Mark T Nadeski,
Pedro Gelabert (Texas Instruments Incorporated)
New audio compression algorithms and non-volatile flash
memory technology have enabled the creation of portable
solid-state personal audio players. This paper
presents a low-power portable audio system based on the
Texas Instruments TMS320C5000 DSP family. This system
is designed to play music and other audio media stored
on flash memory cards that can hold over an hour of CD
quality music. The flash card provides higher audio
quality than a cassette tape, yet is smaller and more
durable than a CD. The music or audio material
downloaded to the flash is obtained from licensed
distributors, either through the internet or through
kiosks setup in retail outlets. All the audio decoding
and watermarking required by this system is handled by
the TMS320C5000 DSP. System performance
characteristics are also presented.
|
ITT-6.2
|
New Implementation Techniques Of A Real-Time MPEG-2 Audio Encoding System
Hyen-O Oh,
Sung-Youn Kim,
Dae-Hee Youn,
Il-Whan Cha (ASSP Lab., Dept. of Elect. Eng., Yonsei Univ.)
In this study, new implementation techniques of a real-time
MPEG-2 audio encoding system are presented.
The system is developed using general-purpose DSP's.
It consists of one master unit and five slave units,
and its structure is basically based on our early work.
Two fast algorithms are developed and applied to the most
compute-intensive routines of the encoding process. These
algorithms play a key role to improve the entire system
performance. The implemented system is designed to encode
audio signal into MPEG-2 layer II bitstream with full
configurations up to 5.1 channels and 640Kbps, and intended
to support state-of-the-art quality. Generated bitstream can
be stored in hard disk on PC or sent to integration system
to be multiplexed with corresponding video-bitstream.
|
ITT-6.3
|
An Improved Residual-Domain Phase/Amplitude Model for Sinusoidal Coding of Speech at Very Low Bit Rates: A Variable Rate Scheme
Sassan Ahmadi (Nokia Mobile Phones, Inc.)
An improved harmonic sinusoidal model is presented,
where the underlying sine wave amplitudes and phases
are efficiently represented using a combination of
linear prediction, linear phase alignment, all-pass
filtering, and spectral sampling in the residual-domain.
The analysis and synthesis systems are introduced and
the derivation and encoding of each model parameter is discussed.
Performance analysis on a large database indicates effective
modeling of the sinusoidal parameters. A variable-rate sinusoidal
coder operating at an average bit rate of 1.75 kbps, based on the
proposed model, has been developed, yielding reproduced
speech of good quality, intelligibility, and naturalness.
The proposed model may find applications in low bit rate speech
coding in high capacity wireless communication systems.
|
ITT-6.4
|
Implementation of an Enhanced Fixed Point Variable Bit-Rate MELP Vocoder on TMS320C549
Ali E Ertan,
Emre B Aksu (Tubitak-BILTEN),
Hakki G Ilk (Ankara University - Electrical and Electronics Engineering Department),
Haydar Karci,
Onder Karpat,
Taner Kolcak,
Levent Sendur,
Mubeccel Demirekler (Tubitak-BILTEN),
Ahmet E Cetin (Bilkent University - Electrical and Electronics Engineering Department)
In this paper, a fixed point Variable Bit-Rate (VBR) Mixed Excitation Linear Predictive Coding (MELP)
vocoder is presented. The VBR-MELP vocoder is also implemented on a TMS320C54x and it achieves virtually
indistinguishable federal standard MELP quality at bit-rates between 1.0 to 1.6 kb/s. The backbone of VBR-MELP
vocoder is similar to that of federal standard MELP. It utilizes a novel sub-band based voice activity detector
in the back-end of encoder to discriminate background noise from speech activity. Since proposed detector
uses only parameters extracted in the encoder, its computational complexity is very low.
|
ITT-6.5
|
Improving EVRC Half Rate by the Algebraic VQ-CELP
Fenghua Liu,
Ryan Heidari (Nokia Mobile Phones)
This paper presents an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of IS-127, the enhanced variable rate codec (EVRC). In AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. An open-loop procedure is used to select the sub-vectors. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, a two-pulse version of ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. Under a joint collaboration effort with SK Telecom, a field-testing was performed in Korea to evaluate the performance of the proposed AVQ algorithm. The results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate.
|
ITT-6.6
|
A 4 kbps Adaptive Fixed Code-Excited Linear Prediction Speech Coder
Hong Kook Kim (AT&T Labs Research, Rm. E148, 180 Park Avenue, Florham Park NJ 07932, USA),
Mi Suk Lee,
Hwang Soo Lee (Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea)
In this paper, we propose an adaptive fixed code-excited
linear prediction (AF-CELP) speech coder operating at 4 kbps.
By exploiting the fact that a fixed codebook contribution
to speech signal is also periodic as the corresponding adaptive
codebook contribution, the adaptive fixed codebook model
efficiently represents excitation signals. In order to
overcome the quality degradation caused by the coarse
quantization of excitation, a paired pulse algebraic
codebook structure is also applied to the excitation
model. Additionally, a pitch prefiltering, a noise
spreading, and a harmonic enhancement technique are
adopted in the decoding process. The spectrogram reading
and informal listening tests proved that the AF-CELP
reproduces high quality speech.
|
ITT-6.7
|
MULTIMODE VARIABLE BIT RATE SPEECH CODING: AN EFFICIENT PARADIGM FOR HIGH-QUALITY LOW-RATE REPRESENTATION OF SPEECH SIGNAL
Amitava Das,
Andy DeJaco,
Sharath Manjunath,
Ananth Ananthapadmanabhan,
Jeff Huang,
Eddie Choy (Qualcomm, Inc.)
The speech signal consists of a time-varying ensemble of different types of segments
with distinct characteristics, which require different degrees of coding resolution
in order to retain an overall high voice quality. A fixed-rate coder can capture such
time-varying characteristics only if it operates at a high enough bit rate. At low bit
rate, a fixed-rate coder will not be able to capture all of these various segments
well and will fail to render high voice quality. A multimode variable bit rate (VBR)
coder uses an arsenal of modes, operating at different bit rates. These modes
are designed to represent these different speech segments optimally with the
ight amount of coding resolution. Thus, a multimode VBR codec adapts
the coding mechanism to the input speech and delivers high quality at low
(average) rates. This paper presents the essential framework and the unique
advantages of a multimode VBR codec and suggests algorithms for the
different modes.
|
ITT-6.8
|
Segmental prototype interpolation coding
Costas S Xydeas,
Thomas M Chapman (University of Manchester, UK)
Current parametric speech coding schemes can achieve high communications quality speech at bit rates in the range of 2.4 to 1.5kbits/sec.
Most schemes sample and quantise, at regular intervals, the "tracks in time" generated by the parameters of the speech production model.
As a result, reconstructed "parameter tracks" do not evolve "smoothly" with time.
Furthermore, no advantage is taken of the "linguistic event" nature of speech.
In this paper, model parameter "time tracks" are split into non overlapping speech "event" related segments.
These segment based evolutions of model parameters are then vector quantised to provide at the receiver a smooth and subjectively meaningful reconstruction.
Thus the paper presents an application of this generic segmental speech model quantisation approach to a 1.5kbits/sec Prototype Interpolation Coding (PIC) system.
Results indicate that the proposed methodology can almost halve the bit rate of this PIC system while preserving overall recovered speech quality.
|
|