ICASSP99 Low Bit Rate Speech Coding II

Low Bit Rate Speech Coding II
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Split Band CELP (SB-CELP) Speech Coder Authors: Mohammad Reza Nakhai, Farokh A Marvasti, Page (NA) Paper number 1257 Abstract: In this paper, we discuss the split band code-excited linear prediction (SB-CELP) speech coder which employs an iterative version of the harmonic sinusoidal coding algorithm to encode the periodic contents of speech signal. Speech spectrum is split into two frequency regions of harmonic and random components and a reliable fundamental frequency is estimated for the harmonic region using both speech and its linear predictive (LP) residual spectrum. The resulting sinusoidal parameters are interpolated to reconstruct the periodicity in speech waveform. The level of periodicity is controlled by computing a cutoff frequency between the harmonic and random regions of spectrum. The random part of spectrum and unvoiced speech are processed using the CELP coding algorithm. The SB-CELP speech coder which combines the powerful features of the sinusoidal and CELP coding algorithms yields a high quality synthetic speech at 4.05 kb/s. IC991257.PDF (From Author) IC991257.PDF (Rasterized) TOP Log Amplitude Modeling of Sinusoids in Voiced Speech Authors: Najam Malik, School of Electrical Engineering, University of New South Wales, Australia. (Australia) W. Harvey Holmes, School of Electrical Engineering, University of New South Wales, Australia. (Australia) Page (NA) Paper number 1278 Abstract: We present an algorithm for all-pole (envelope) modeling of the amplitudes of sinusoids present in voiced speech segments which works even when the number of sinusoids is very small, as occurs with high-pitched speakers. In contrast to previous methods, this algorithm minimizes a squared error criterion in the log amplitude domain rather than the amplitude domain, and so is better matched to the properties of the human auditory system. A weighted iterative approach is used to get near optimal solutions to this otherwise nonlinear problem. This new frequency domain log amplitude modeling (LAM) algorithm gives impressive results, especially in the case of high pitched female voices where conventional linear prediction methods are inadequate. The algorithm can easily be generalized to develop pole-zero models. IC991278.PDF (From Author) IC991278.PDF (Rasterized) TOP 1.2kbit/s Harmonic Coder Using Auditory Filters Authors: Minoru Kohata, Page (NA) Paper number 1356 Abstract: In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of harmonic coding, using sinusoids whose frequencies are multiples of the fundamental frequency, where the amplitudes of the sinusoids are adaptively modulated using Gammatone filters as a perceptual weighting filter. The sinusoids' phases are also adjusted so as to maximize the perceptual quality. In order to reduce the total bit rate to 1.2 kbit/s, a new segment coder for spectral information (LSP coefficients) using DP matching is also proposed. The quality of the synthesized speech was improved by 0.45 in the Mean Opinion Score (MOS) compared with that of the simple LPC vocoder operating at the same rate, and it was comparable to that of 2.4kbit/s MELP coder. IC991356.PDF (From Author) IC991356.PDF (Rasterized) TOP Exponential Sinusoidal Modeling of Transitional Speech Segments Authors: Jesper Jensen, Søren Holdt Jensen, Egon Hansen, Page (NA) Paper number 1446 Abstract: A generalized sinusoidal model for speech signal processing is studied. The main feature of the model is that the amplitude of each sinusoidal component is allowed to vary exponentially with time. We propose to use the model in transitional speech segments such as speech onsets and voiced/unvoiced transitions. Computer simulations with natural speech signals indicate substantial better modeling performance in both transitional and voiced regions compared with the traditional constant-amplitude sinusoidal model. IC991446.PDF (From Author) IC991446.PDF (Rasterized) TOP Harmonic+Noise Coding Using Improved V/UV Mixing and Efficient Spectral Quantization Authors: Eric W. M. Yu, City University of Hong Kong (Hong Kong) Cheung-Fat Chan, City University of Hong Kong (Hong Kong) Page (NA) Paper number 1596 Abstract: This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates. IC991596.PDF (From Author) IC991596.PDF (Rasterized) TOP A 4 Kb/s Toll Quality Harmonic Excitation Linear Predictive Speech Coder Authors: Suat Yeldener, COMSAT Laboratories, Clarkburg, Maryland, USA (USA) Page (NA) Paper number 1731 Abstract: The Harmonic Excitation Linear Predictive Speech Coder (HE-LPC) is a technique derived from MBE and MB-LPC type of speech coding algorithms. The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below. This coder employs a new pitch estimation and voicing technique. In addition, new DCT based LPC and residual amplitude quantization techniques have been developed. The 4 kb/s HE-LPC coder with a 14th order LPC filter was found to produce much better speech quality than the various low rate speech coding standards, including 3.6 kb/s INMARSAT Mini-M AMBE vocoder. During formal ITU ACR test, the 4 kb/s HE-LPC vocoder was found to produced equivalent performance to 32 kb/s ADPCM and G.729 for both flat and modified IRS filtered clean input speech conditions. The HE-LPC algorithm can also be extended to cover bit rates between 1.2 and 8 kb/s range depending on the application. IC991731.PDF (From Author) IC991731.PDF (Rasterized) TOP High Quality MELP Coding at Bit-Rates Around 4 kb/s Authors: Jacek Stachurski, Alan V McCree, Vishu R Viswanathan, Page (NA) Paper number 2072 Abstract: Recently, a number of coding techniques have been reported to achieve near toll quality synthesized speech at bit-rates around 4 kb/s. These include variants of Code Excited Linear Prediction (CELP), Sinusoidal Transform Coding (STC) and Multi-Band Excitation (MBE). While CELP has been an effective technique for bit-rates above 6 kb/s, STC, MBE, Waveform Interpolation (WI) and Mixed Excitation Linear Prediction (MELP) models seem to be attractive at bit-rates below 3 kb/s. In this paper, we present a system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6--2.4 kb/s. We have enhanced the MELP model producing significantly higher speech quality at bit-rates above 2.4 kb/s. We describe the development and testing of a high quality 4 kb/s MELP coder. IC992072.PDF (From Author) IC992072.PDF (Rasterized) TOP Pitch Quantization in Low Bit-Rate Speech Coding Authors: Thomas Eriksson, Hong-Goo Kang, Page (NA) Paper number 2329 Abstract: This paper describes a new pitch quantization method for low bit-rate speech coding systems. The logarithm of the pitch period is quantized in a combination of two uniform quantizers, one working directly on logarithmic pitch values and the other working on the difference between current and previous logarithmic pitch. The best of the two output values is transmitted to the receiver. This scheme can exploit both redundancy in the signal and properties of the ear to achieve an efficient quantization. Listening tests show that the proposed scheme allows the pitch parameter to be quantized using 4 bits, with no degradation in audible quality. IC992329.PDF (From Author) IC992329.PDF (Rasterized) TOP