Speech and Audio Technologies

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

A DSP Powered Solid State Audio System

Authors:

Jason D Kridner,
Mark T Nadeski,
Pedro R Gelabert,

Page (NA) Paper number 2137

Abstract:

New audio compression algorithms and non-volatile flash memory technology have enabled the creation of portable solid-state personal audio players. This paper presents a low-power portable audio system based on the Texas Instruments TMS320C5000 DSP family. This system is designed to play music and other audio media stored on flash memory cards that can hold over an hour of CD quality music. The flash card provides higher audio quality than a cassette tape, yet is smaller and more durable than a CD. The music or audio material downloaded to the flash is obtained from licensed distributors, either through the internet or through kiosks setup in retail outlets. All the audio decoding and watermarking required by this system is handled by the TMS320C5000 DSP. System performance characteristics are also presented.

IC992137.PDF (From Author) IC992137.PDF (Rasterized)

TOP


New Implementation Techniques Of A Real-Time MPEG-2 Audio Encoding System

Authors:

Hyen-O Oh,
Sung-Youn Kim,
Dae-Hee Youn,
Il-Whan Cha,

Page (NA) Paper number 1594

Abstract:

In this study, new implementation techniques of a real-time MPEG-2 audio encoding system are presented. The system is developed using general-purpose DSP's. It consists of one master unit and five slave units, and its structure is basically based on our early work. Two fast algorithms are developed and applied to the most compute-intensive routines of the encoding process. These algorithms play a key role to improve the entire system performance. The implemented system is designed to encode audio signal into MPEG-2 layer II bitstream with full configurations up to 5.1 channels and 640Kbps, and intended to support state-of-the-art quality. Generated bitstream can be stored in hard disk on PC or sent to integration system to be multiplexed with corresponding video-bitstream.

IC991594.PDF (From Author) IC991594.PDF (Rasterized)

TOP


An Improved Residual-Domain Phase/Amplitude Model for Sinusoidal Coding of Speech at Very Low Bit Rates: A Variable Rate Scheme

Authors:

Sassan Ahmadi,

Page (NA) Paper number 2193

Abstract:

An improved harmonic sinusoidal model is presented, where the underlying sine wave amplitudes and phases are efficiently represented using a combination of linear prediction, linear phase alignment, all-pass filtering, and spectral sampling in the residual-domain. The analysis and synthesis systems are introduced and the derivation and encoding of each model parameter is discussed. Performance analysis on a large database indicates effective modeling of the sinusoidal parameters. A variable-rate sinusoidal coder operating at an average bit rate of 1.75 kbps, based on the proposed model, has been developed, yielding reproduced speech of good quality, intelligibility, and naturalness. The proposed model may find applications in low bit rate speech coding in high capacity wireless communication systems.

IC992193.PDF (From Author) IC992193.PDF (Rasterized)

TOP


Implementation of an Enhanced Fixed Point Variable Bit-Rate MELP Vocoder on TMS320C549

Authors:

Ali E Ertan,
Emre B Aksu,
Hakki G Ilk,
Haydar Karci,
Önder Karpat,
Taner Kolçak,
Levent S;endur,
Mubeccel Demirekler,
Ahmet Enis Çetin,

Page (NA) Paper number 1758

Abstract:

In this paper, a fixed point Variable Bit-Rate (VBR) Mixed Excitation Linear Predictive Coding (MELP) vocoder is presented. The VBR-MELP vocoder is also implemented on a TMS320C54x and it achieves virtually indistinguishable federal standard MELP quality at bit-rates between 1.0 to 1.6 kb/s. The backbone of VBR-MELP vocoder is similar to that of federal standard MELP. It utilizes a novel sub-band based voice activity detector in the back-end of encoder to discriminate background noise from speech activity. Since proposed detector uses only parameters extracted in the encoder, its computational complexity is very low.

IC991758.PDF (From Author) IC991758.PDF (Rasterized)

TOP


Improving EVRC Half Rate by the Algebraic VQ-CELP

Authors:

Fenghua Liu,
Ryan Heidari,

Page (NA) Paper number 3020

Abstract:

This paper presents an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of IS-127, the enhanced variable rate codec (EVRC). In AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. An open-loop procedure is used to select the sub-vectors. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, a two-pulse version of ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. Under a joint collaboration effort with SK Telecom, a field-testing was performed in Korea to evaluate the performance of the proposed AVQ algorithm. The results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate.

IC993020.PDF (From Author) IC993020.PDF (Rasterized)

TOP


A 4 kbps Adaptive Fixed Code-Excited Linear Prediction Speech Coder

Authors:

Hong Kook Kim, AT&T Labs Research, Rm. E148, 180 Park Avenue, Florham Park NJ 07932, USA (USA)
Mi Suk Lee, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea (Korea)
Hwang Soo Lee, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea (Korea)

Page (NA) Paper number 1329

Abstract:

In this paper, we propose an adaptive fixed code-excited linear prediction (AF-CELP) speech coder operating at 4 kbps. By exploiting the fact that a fixed codebook contribution to speech signal is also periodic as the corresponding adaptive codebook contribution, the adaptive fixed codebook model efficiently represents excitation signals. In order to overcome the quality degradation caused by the coarse quantization of excitation, a paired pulse algebraic codebook structure is also applied to the excitation model. Additionally, a pitch prefiltering, a noise spreading, and a harmonic enhancement technique are adopted in the decoding process. The spectrogram reading and informal listening tests proved that the AF-CELP reproduces high quality speech.

IC991329.PDF (From Author)

TOP


Multimode Variable Bit Rate Speech Coding: An Efficient Paradigm For High-Quality Low-Rate Representation Of Speech Signal

Authors:

Amitava Das,
Andy DeJaco,
Sharath Manjunath,
Ananth Ananthapadmanabhan,
Jeff Huang,
Eddie Choy,

Page (NA) Paper number 3014

Abstract:

The speech signal consists of a time-varying ensemble of different types of segments with distinct characteristics, which require different degrees of coding resolution in order to retain an overall high voice quality. A fixed-rate coder can capture such time-varying characteristics only if it operates at a high enough bit rate. At low bit rate, a fixed-rate coder will not be able to capture all of these various segments well and will fail to render high voice quality. A multimode variable bit rate (VBR) coder uses an arsenal of modes, operating at different bit rates. These modes are designed to represent these different speech segments optimally with the ight amount of coding resolution. Thus, a multimode VBR codec adapts the coding mechanism to the input speech and delivers high quality at low (average) rates. This paper presents the essential framework and the unique advantages of a multimode VBR codec and suggests algorithms for the different modes.

IC993014.PDF (From Author) IC993014.PDF (Rasterized)

TOP


Segmental Prototype Interpolation Coding

Authors:

Costas C.S. Xydeas, University of Manchester, UK (U.K.)
Thomas M Chapman, University of Manchester, UK (U.K.)

Page (NA) Paper number 2340

Abstract:

Current parametric speech coding schemes can achieve high communications quality speech at bit rates in the range of 2.4 to 1.5kbits/sec. Most schemes sample and quantise, at regular intervals, the "tracks in time" generated by the parameters of the speech production model. As a result, reconstructed "parameter tracks" do not evolve "smoothly" with time. Furthermore, no advantage is taken of the "linguistic event" nature of speech. In this paper, model parameter "time tracks" are split into non overlapping speech "event" related segments. These segment based evolutions of model parameters are then vector quantised to provide at the receiver a smooth and subjectively meaningful reconstruction. Thus the paper presents an application of this generic segmental speech model quantisation approach to a 1.5kbits/sec Prototype Interpolation Coding (PIC) system. Results indicate that the proposed methodology can almost halve the bit rate of this PIC system while preserving overall recovered speech quality.

IC992340.PDF (From Author) IC992340.PDF (Rasterized)

TOP