Low Bit Rate Speech Coding II

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Split Band CELP (SB-CELP) Speech Coder

Authors:

Mohammad Reza Nakhai,
Farokh A Marvasti,

Page (NA) Paper number 1257

Abstract:

In this paper, we discuss the split band code-excited linear prediction (SB-CELP) speech coder which employs an iterative version of the harmonic sinusoidal coding algorithm to encode the periodic contents of speech signal. Speech spectrum is split into two frequency regions of harmonic and random components and a reliable fundamental frequency is estimated for the harmonic region using both speech and its linear predictive (LP) residual spectrum. The resulting sinusoidal parameters are interpolated to reconstruct the periodicity in speech waveform. The level of periodicity is controlled by computing a cutoff frequency between the harmonic and random regions of spectrum. The random part of spectrum and unvoiced speech are processed using the CELP coding algorithm. The SB-CELP speech coder which combines the powerful features of the sinusoidal and CELP coding algorithms yields a high quality synthetic speech at 4.05 kb/s.

IC991257.PDF (From Author) IC991257.PDF (Rasterized)

TOP


Log Amplitude Modeling of Sinusoids in Voiced Speech

Authors:

Najam Malik, School of Electrical Engineering, University of New South Wales, Australia. (Australia)
W. Harvey Holmes, School of Electrical Engineering, University of New South Wales, Australia. (Australia)

Page (NA) Paper number 1278

Abstract:

We present an algorithm for all-pole (envelope) modeling of the amplitudes of sinusoids present in voiced speech segments which works even when the number of sinusoids is very small, as occurs with high-pitched speakers. In contrast to previous methods, this algorithm minimizes a squared error criterion in the log amplitude domain rather than the amplitude domain, and so is better matched to the properties of the human auditory system. A weighted iterative approach is used to get near optimal solutions to this otherwise nonlinear problem. This new frequency domain log amplitude modeling (LAM) algorithm gives impressive results, especially in the case of high pitched female voices where conventional linear prediction methods are inadequate. The algorithm can easily be generalized to develop pole-zero models.

IC991278.PDF (From Author) IC991278.PDF (Rasterized)

TOP


1.2kbit/s Harmonic Coder Using Auditory Filters

Authors:

Minoru Kohata,

Page (NA) Paper number 1356

Abstract:

In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of harmonic coding, using sinusoids whose frequencies are multiples of the fundamental frequency, where the amplitudes of the sinusoids are adaptively modulated using Gammatone filters as a perceptual weighting filter. The sinusoids' phases are also adjusted so as to maximize the perceptual quality. In order to reduce the total bit rate to 1.2 kbit/s, a new segment coder for spectral information (LSP coefficients) using DP matching is also proposed. The quality of the synthesized speech was improved by 0.45 in the Mean Opinion Score (MOS) compared with that of the simple LPC vocoder operating at the same rate, and it was comparable to that of 2.4kbit/s MELP coder.

IC991356.PDF (From Author) IC991356.PDF (Rasterized)

TOP


Exponential Sinusoidal Modeling of Transitional Speech Segments

Authors:

Jesper Jensen,
Søren Holdt Jensen,
Egon Hansen,

Page (NA) Paper number 1446

Abstract:

A generalized sinusoidal model for speech signal processing is studied. The main feature of the model is that the amplitude of each sinusoidal component is allowed to vary exponentially with time. We propose to use the model in transitional speech segments such as speech onsets and voiced/unvoiced transitions. Computer simulations with natural speech signals indicate substantial better modeling performance in both transitional and voiced regions compared with the traditional constant-amplitude sinusoidal model.

IC991446.PDF (From Author) IC991446.PDF (Rasterized)

TOP


Harmonic+Noise Coding Using Improved V/UV Mixing and Efficient Spectral Quantization

Authors:

Eric W. M. Yu, City University of Hong Kong (Hong Kong)
Cheung-Fat Chan, City University of Hong Kong (Hong Kong)

Page (NA) Paper number 1596

Abstract:

This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates.

IC991596.PDF (From Author) IC991596.PDF (Rasterized)

TOP


A 4 Kb/s Toll Quality Harmonic Excitation Linear Predictive Speech Coder

Authors:

Suat Yeldener, COMSAT Laboratories, Clarkburg, Maryland, USA (USA)

Page (NA) Paper number 1731

Abstract:

The Harmonic Excitation Linear Predictive Speech Coder (HE-LPC) is a technique derived from MBE and MB-LPC type of speech coding algorithms. The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below. This coder employs a new pitch estimation and voicing technique. In addition, new DCT based LPC and residual amplitude quantization techniques have been developed. The 4 kb/s HE-LPC coder with a 14th order LPC filter was found to produce much better speech quality than the various low rate speech coding standards, including 3.6 kb/s INMARSAT Mini-M AMBE vocoder. During formal ITU ACR test, the 4 kb/s HE-LPC vocoder was found to produced equivalent performance to 32 kb/s ADPCM and G.729 for both flat and modified IRS filtered clean input speech conditions. The HE-LPC algorithm can also be extended to cover bit rates between 1.2 and 8 kb/s range depending on the application.

IC991731.PDF (From Author) IC991731.PDF (Rasterized)

TOP


High Quality MELP Coding at Bit-Rates Around 4 kb/s

Authors:

Jacek Stachurski,
Alan V McCree,
Vishu R Viswanathan,

Page (NA) Paper number 2072

Abstract:

Recently, a number of coding techniques have been reported to achieve near toll quality synthesized speech at bit-rates around 4 kb/s. These include variants of Code Excited Linear Prediction (CELP), Sinusoidal Transform Coding (STC) and Multi-Band Excitation (MBE). While CELP has been an effective technique for bit-rates above 6 kb/s, STC, MBE, Waveform Interpolation (WI) and Mixed Excitation Linear Prediction (MELP) models seem to be attractive at bit-rates below 3 kb/s. In this paper, we present a system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6--2.4 kb/s. We have enhanced the MELP model producing significantly higher speech quality at bit-rates above 2.4 kb/s. We describe the development and testing of a high quality 4 kb/s MELP coder.

IC992072.PDF (From Author) IC992072.PDF (Rasterized)

TOP


Pitch Quantization in Low Bit-Rate Speech Coding

Authors:

Thomas Eriksson,
Hong-Goo Kang,

Page (NA) Paper number 2329

Abstract:

This paper describes a new pitch quantization method for low bit-rate speech coding systems. The logarithm of the pitch period is quantized in a combination of two uniform quantizers, one working directly on logarithmic pitch values and the other working on the difference between current and previous logarithmic pitch. The best of the two output values is transmitted to the receiver. This scheme can exploit both redundancy in the signal and properties of the ear to achieve an efficient quantization. Listening tests show that the proposed scheme allows the pitch parameter to be quantized using 4 bits, with no degradation in audible quality.

IC992329.PDF (From Author) IC992329.PDF (Rasterized)

TOP