Topics in Speech Coding

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Performance Assessment Of Tandem Connection Of Enhanced Cellular Coders

Authors:

Simao F Campos Neto,
Franklin L Corcoran,
Ara Karahisar,

Page (NA) Paper number 1313

Abstract:

The growth and increased competition in the second-generation (digital) cellular communication market has led service providers to improve the speech quality in their systems by introducing enhanced speech coders. Advancements in speech coding allowed designers to aim at toll-quality for these enhanced coders, and investigation of the impact of speech coders on the end-to-end quality of the public switched telephone network (PSTN) is necessary. This paper presents the continuation of a series of studies on the impact of tandem connection of cellular systems, where the quality of the enhanced cellular coders for major systems in use today is studied in the context of PSTN interconnection. A major conclusion of this study is that deployment of enhanced coders in second-generation cellular systems makes possible a substantial increase in quality of the cellular connections when in tandem with other speech coders in long haul international networks.

IC991313.PDF (From Author) IC991313.PDF (Rasterized)

TOP


TTS Based Very Low Bit Rate Speech Coder

Authors:

Ki-Seung Lee,
Richard V. Cox,

Page (NA) Paper number 1457

Abstract:

This paper addresses a speech coder which uses a Text-To-Speech (TTS) synthesis system to achieve very low bit rates (sub 1kbps). The main issue of the work is the accurate coding of the pitch(F0) and gain contours which are principle components of prosody. This is of paramount interest since the correct prosody will increase naturalness and an efficient coding scheme will provide high coding gain. Together with the phonetic transcription, the F0 and gain contour constitute the parameters that are necessary for the TTS system to synthesize the speech signal. Piecewise linear approximation is used to code the F0 parameter. A technique which minimizes bit rate while maintaining F0 error below a given threshold are described. To obtain both high compression and smoothly changing gain contours, the variance of the signal is averaged over each half phoneme length is transmitted as gain information. With single speaker stimuli, and a priori text transcription information, we obtained naturally sounding speech at an average bit rate of about 300 bps.

IC991457.PDF (From Author) IC991457.PDF (Rasterized)

TOP


Wideband Speech Coding With Toll Quality Based On IA-Model

Authors:

Ling Kok Ng, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore)
Gang Li, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore)
Xiao Lin, Center for Signal Processing, Nanyang Technological University, Singapore 639798 (Singapore)
Guoan Bi, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore)

Page (NA) Paper number 1516

Abstract:

In this paper, we propose an instantaneous amplitude (IA) based model for speech signal representation. This can avoid the difficulty in dealing with the time-varying phases and allows us to perform an optimization procedure easily such that the synthetic signal can be made as close to the original one as possible. A simplified frequency-picking algorithm is derived to shorten the processing time while still maintaining the quality of the synthetic speech. Experiments show that the synthetic speech with the developed technique is of toll quality and almost perceptually indistinguishable from the original speech. Initiate work on the coding of the parameters, for a 16kHz sampled speech, for the IA model is done and a toll quality synthesized speech at a bit rate of 40kbps is achieved.

IC991516.PDF (From Author) IC991516.PDF (Rasterized)

TOP


4 kb/s Multi-Pulse Based CELP Speech Coding Using Excitation Switching

Authors:

Kazunori Ozawa,

Page (NA) Paper number 1656

Abstract:

This paper proposes an MP-CELP (Multi-Pulse-based CELP) speech coding at 4 kb/s. In MP-CELP, amplitudes or signs of multi-pulse excitation are simultaneously vector quantized (VQ). In order to improve speech quality for background noise conditions, excitation signal is switched between voiced and unvoiced speech, and the number of pulse is greatly increased for unvoiced speech by restricting pulse locations. Further, in order to improve voiced speech quality, the optimal combination among adaptive codebook lag, pulse location, sign codevector and gain codevector is selected which minimizes distortion by employing delayed-decision search. The subjective evaluation results show that speech quality for 4 kb/s MP-CELP is close to that for ITU-T G.723.1 (6.3 kb/s) and G.729 (8 kb/s) in M-IRS clean speech condition. For background noise conditions, the introduction for the excitation switching and the pulse location restriction significantly improves MOS value by 0.4. However, further improvement is still required, except for interference talker condition.

IC991656.PDF (From Author) IC991656.PDF (Rasterized)

TOP


An Adaptive Multi-Rate Speech Coder For Digital Cellular Telephony

Authors:

Erdal Paksoy,
Juan Carlos De Martin,
Alan V McCree,
Christian G Gerlach,
Anand Anandakumar,
Wai-Ming Lai,
Vishu R Viswanathan,

Page (NA) Paper number 1986

Abstract:

We have developed an adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate (22.8 kb/s) and half rate (11.4 kb/s) channels and to maintain high quality in the presence of highly varying background noise and channel conditions. Within each total rate, several codec modes with different source/channel bit rate allocations are used. The speech coders in each codec mode are based on the CELP algorithm operating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where the lowest rate coder is a source controlled multi-modal speech coder. The decoders monitor channel quality at both ends of the wireless link using the soft values for the received bits and assist the base station in selecting the codec mode that is appropriate for a given channel condition. The coder was submitted to the GSM AMR standardization competition and met the qualification requirements in an independent formal MOS test.

IC991986.PDF (From Author) IC991986.PDF (Rasterized)

TOP


An Adaptive Post-Filtering Technique Based on The Modified Yule-Walker Filter

Authors:

Azhar Mustapha, COMSAT Laboratories, Clarksburg, Maryland, USA (USA)
Suat Yeldener, COMSAT Laboratories, Clarksburg, Maryland, USA (USA)

Page (NA) Paper number 2024

Abstract:

This paper presents an adaptive time-domain post-filtering technique based on the modified Yule-Walker filter. Conventionally, post-filtering is derived from an original LPC spectrum. In general, this time-domain technique produces unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse and high pass filtering and causes unnecessary attenuation or amplification of some frequency components that introduces muffling in speech quality. This effect increases when voice coders are tandemed together. Another approach of designing a post-filter was developed by McAulay and Quatieri which can only be used in sinusoidal based speech coders. We have also developed another new time-domain post-filtering technique. This technique eliminates the problem of spectral tilt in speech spectrum that can be applied to various speech coders. The new post-filter has a flat frequency response at the formant peaks of speech spectrum. Instead of looking at the modified LPC synthesis, inverse, and high pass filtering in the conventional time-domain technique, we gather information about the poles of the LPC spectrum in the new technique. This post-filtering technique has been used in a 4 kb/s Harmonic Excitation Linear Predictive Coder (HE-LPC) and a subjective listening tests have indicated that this technique outperforms the conventional one in both one and two tandem connections.

IC992024.PDF (From Author) IC992024.PDF (Rasterized)

TOP


A Modular Approach to Speech Enhancement with an Application to Speech Coding

Authors:

Anthony J Accardi,
Richard V. Cox,

Page (NA) Paper number 2099

Abstract:

Ephraim and Malah's MMSE-LSA speech enhancement algorithm, while robust and effective, is difficult to tune and adjust for the tradeoff between noise reduction and distortion. We suggest a means of generalizing this design, which allows for other estimators besides the MMSE-LSA to be used within the same supporting framework. When a modified version of Ephraim and Van Trees's spectral domain constrained signal subspace estimator is used in this manner, we obtain a system with greater flexibility and similar performance. We also explore the possibility of using different speech enhancement techniques as pre-processors for different parameter extraction modules of the IS-641 speech coder. We show that such a strategy can increase the quality of the coded speech and lead to a system that is more robust to differing noise types.

IC992099.PDF (From Author) IC992099.PDF (Rasterized)

TOP


On Speech Coding in a Perceptual Domain

Authors:

Gernot Kubin,
W. Bastiaan Kleijn,

Page (NA) Paper number 2327

Abstract:

In many speech coders, the distortion criterion operates on the speech signal or a signal obtained by adaptive linear filtering of the speech signal. To satisfy computational and delay constraints, the distortion criterion must be reduced to a very simple approximation of the auditory system. This drawback of conventional approaches motivates a new speech coding paradigm in which the coding is performed in a domain where the single-letter squared-error criterion forms an accurate representation of perception. The new paradigm requires a model of the auditory periphery which is accurate, can be be inverted with relatively low computational effort, and which represents the signal with relatively few parameters. In this paper we develop such a model of the auditory periphery and discuss its suitability for speech coding. Our results indicate that the new paradigm in general and our auditory model in particular form a promising basis for speech and audio coding.

IC992327.PDF (From Author) IC992327.PDF (Rasterized)

TOP