SP-15.1

Split Band CELP (SB-CELP) Speech Coder
Mohammad R Nakhai, Farokh A Marvasti (King's College, University of London)

In this paper, we discuss the split band code-excited linear prediction (SB-CELP) speech coder which employs an iterative version of the harmonic sinusoidal coding algorithm to encode the periodic contents of speech signal. Speech spectrum is split into two frequency regions of harmonic and random components and a reliable fundamental frequency is estimated for the harmonic region using both speech and its linear predictive (LP) residual spectrum. The resulting sinusoidal parameters are interpolated to reconstruct the periodicity in speech waveform. The level of periodicity is controlled by computing a cutoff frequency between the harmonic and random regions of spectrum. The random part of spectrum and unvoiced speech are processed using the CELP coding algorithm. The SB-CELP speech coder which combines the powerful features of the sinusoidal and CELP coding algorithms yields a high quality synthetic speech at 4.05 kb/s.

SP-15.2

Log Amplitude Modeling of Sinusoids in Voiced Speech
Najam Malik, W. Harvey Holmes (School of Electrical Engineering, University of New South Wales, Australia.)

We present an algorithm for all-pole (envelope) modeling of the amplitudes of sinusoids present in voiced speech segments which works even when the number of sinusoids is very small, as occurs with high-pitched speakers. In contrast to previous methods, this algorithm minimizes a squared error criterion in the log amplitude domain rather than the amplitude domain, and so is better matched to the properties of the human auditory system. A weighted iterative approach is used to get near optimal solutions to this otherwise nonlinear problem. This new frequency domain log amplitude modeling (LAM) algorithm gives impressive results, especially in the case of high pitched female voices where conventional linear prediction methods are inadequate. The algorithm can easily be generalized to develop pole-zero models.

SP-15.3

1.2kbit/s Harmonic Coder Using Auditory Filters
Minoru Kohata (Chiba Institute of Technology)

In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of harmonic coding, using sinusoids whose frequencies are multiples of the fundamental frequency, where the amplitudes of the sinusoids are adaptively modulated using Gammatone filters as a perceptual weighting filter. The sinusoids' phases are also adjusted so as to maximize the perceptual quality. In order to reduce the total bit rate to 1.2 kbit/s, a new segment coder for spectral information (LSP coefficients) using DP matching is also proposed. The quality of the synthesized speech was improved by 0.45 in the Mean Opinion Score (MOS) compared with that of the simple LPC vocoder operating at the same rate, and it was comparable to that of 2.4kbit/s MELP coder.

SP-15.4

Exponential Sinusoidal Modeling of Transitional Speech Segments
Jesper Jensen, S�ren H Jensen, Egon Hansen (CPK, Aalborg University)

A generalized sinusoidal model for speech signal processing is studied. The main feature of the model is that the amplitude of each sinusoidal component is allowed to vary exponentially with time. We propose to use the model in transitional speech segments such as speech onsets and voiced/unvoiced transitions. Computer simulations with natural speech signals indicate substantial better modeling performance in both transitional and voiced regions compared with the traditional constant-amplitude sinusoidal model.

SP-15.5

Harmonic+Noise Coding Using Improved V/UV Mixing and Efficient Spectral Quantization
Eric W. M. Yu, Cheung-Fat Chan (City University of Hong Kong)

This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates.

SP-15.6

A 4 Kb/s Toll Quality Harmonic Excitation Linear Predictive Speech Coder
Suat Yeldener (COMSAT Laboratories, Clarkburg, Maryland, USA)

The Harmonic Excitation Linear Predictive Speech Coder (HE-LPC) is a technique derived from MBE and MB-LPC type of speech coding algorithms. The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below. This coder employs a new pitch estimation and voicing technique. In addition, new DCT based LPC and residual amplitude quantization techniques have been developed. The 4 kb/s HE-LPC coder with a 14th order LPC filter was found to produce much better speech quality than the various low rate speech coding standards, including 3.6 kb/s INMARSAT Mini-M AMBE vocoder. During formal ITU ACR test, the 4 kb/s HE-LPC vocoder was found to produced equivalent performance to 32 kb/s ADPCM and G.729 for both flat and modified IRS filtered clean input speech conditions. The HE-LPC algorithm can also be extended to cover bit rates between 1.2 and 8 kb/s range depending on the application.

SP-15.7

High Quality MELP Coding at Bit-Rates Around 4 kb/s
Jacek Stachurski, Alan McCree, Vishu Viswanathan (Texas Instruments)

Recently, a number of coding techniques have been reported to achieve near toll quality synthesized speech at bit-rates around 4 kb/s. These include variants of Code Excited Linear Prediction (CELP), Sinusoidal Transform Coding (STC) and Multi-Band Excitation (MBE). While CELP has been an effective technique for bit-rates above 6 kb/s, STC, MBE, Waveform Interpolation (WI) and Mixed Excitation Linear Prediction (MELP) models seem to be attractive at bit-rates below 3 kb/s. In this paper, we present a system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6--2.4 kb/s. We have enhanced the MELP model producing significantly higher speech quality at bit-rates above 2.4 kb/s. We describe the development and testing of a high quality 4 kb/s MELP coder.

SP-15.8

Pitch Quantization in Low Bit-Rate Speech Coding
Thomas Eriksson, Hong-Goo Kang (AT&T Labs Research, SIPS, 180 Park Avenue, Florham Park, NJ07932)

This paper describes a new pitch quantization method for low bit-rate speech coding systems. The logarithm of the pitch period is quantized in a combination of two uniform quantizers, one working directly on logarithmic pitch values and the other working on the difference between current and previous logarithmic pitch. The best of the two output values is transmitted to the receiver. This scheme can exploit both redundancy in the signal and properties of the ear to achieve an efficient quantization. Listening tests show that the proposed scheme allows the pitch parameter to be quantized using 4 bits, with no degradation in audible quality.

< SP-14 SP-16 >

Last Update: February 4, 1999 Ingo Höntsch