TOPICS IN SPEECH CODING

Chair: Peter Kroon, AT&T Bell Laboratories (USA)

Home

Improving 16 kb/s G.728 LD-CELP Speech Coder for Frame Erasure Channels

Authors:

Craig R. Watkins, AT&T Bell Laboratories (USA)
Juin-Hwey Chen, AT&T Bell Laboratories (USA)

Volume 1, Page 241

Abstract:

We have improved G.728 output speech quality for frame erasure channels. Three cases are considered: (1) no change to G.728, (2) change only the G.728 decoder, and (3) change both the encoder and decoder. In case 1, we synthesize a bit-stream during erased frames so that the decoder decodes an excitation with low energy or with characteristics similar to the excitation of previous good frames. In case 2, the gain-scaled excitation and LPC coefficients are extrapolated, and vital operations of backward LPC and gain adaptations are continued. Case 3 adds spectral smoothing and increases bandwidth expansion for the LPC and gain predictors. These techniques are quite effective, as the speech quality degradation due to 1% frame erasures ranges from just slightly noticeable in case 1 to almost unnoticeable in case 3. For case 3, the output speech is still intelligiable for frame erasure rates up to 10% or even 20%.

300dpi TIFF Images of pages:

241 242 243 244

Acrobat PDF file of whole paper:

ic950241.pdf

TOP

Reconstruction of Missing Packets for CELP-Based Speech Coders

Authors:

Aamir Husain, Simon Fraser University (CANADA)
Vladimir Cuperman, Simon Fraser University (CANADA)

Volume 1, Page 245

Abstract:

A common aspect of speech transmission through packetised networks is the need to consider the discarded (missing) packets as a result of error detection or network overload. The missing packets and the possible mistracking that results in the speech decoder lead to significant quality degradation. In this paper, we introduce a packet recovery technique for CELP based speech coders. The proposed technique extrapolates independently the excitation signal and the short-term synthesis filter. A recovery strategy based on speech classification (voiced, unvoiced, transition, silence) is discussed. The extrapolation of the short- term filter uses a least-squares fading memory polynomial filter applied to reflection coefficients. Objective and subjective quality evaluations of the recovery system applied to the LD-CELP G.728 standard and a variable rate CELP system for random and burst frame erasures are presented. The results indicate that the system is robust up to a frame erasure rate of 10%. Very little degradation in quality was observed at erasure rates up to 3%.

300dpi TIFF Images of pages:

245 246 247 248

Acrobat PDF file of whole paper:

ic950245.pdf

TOP

A Robust Variable-Rate Speech Coder

Authors:

A. Shen, University of California - Los Angeles (USA)
B. Tang, University of California - Los Angeles (USA)
A. Alwan, University of California - Los Angeles (USA)
G. Pottie, University of California - Los Angeles (USA)

Volume 1, Page 249

Abstract:

The goal of this study is to develop a robust and high-quality speech coder for wireless communication. The proposed coder is a perceptually-based variable-rate subband coder. The perceptual metric ensures that encoding is optimized to the human listener and is based on calculating the signal-to-mask ratio in short-time frames of the input signal. An adaptive bit allocation scheme is employed and the subband energies are then quantized using a Max-Lloyd quantizer. The coder is fully scalable--increasing the bit rates, improves the quality of encoded speech. Subjective listening tests, using quiet and noisy input signals, indicate that the proposed coder produces high-quality speech when operating at 12 kbps or higher. In error-free conditions, our coder has comparable performance to that of QCELP and GSM coders. For speech in background noise, however, our coder, at 12 kbps, outperforms QCELP significantly, and for music, it outperforms both QCELP and GSM.

300dpi TIFF Images of pages:

249 250 251 252

Acrobat PDF file of whole paper:

ic950249.pdf

TOP

Wideband Speech Coding Using Multiple Codebooks and Glottal Pulses

Authors:

C. McElroy, University College - Dublin (IRELAND)
B.P. Murray, University College - Dublin (IRELAND)
A.D. Fagan, University College - Dublin (IRELAND)

Volume 1, Page 253

Abstract:

We propose a coder that achieves near transparent wideband speech coding by parameterising the prediction residual through the use of multiple codebooks and synthetic glottal pulses coupled with adaptive bit allocation. The use of synthetic glottal pulses improves the performance of the coder compared to a previous coder using a single impulse without increasing the bit rate. This multiple codebook approach results in a coder operating at 16 kb/s and 24 kb/s that provides comparable speech quality to the CCITT G.722 coder operating at 64 kb/s

300dpi TIFF Images of pages:

253 254 255 256

Acrobat PDF file of whole paper:

ic950253.pdf

TOP

Speech Coding Using ISI Coded Quantization

Authors:

Nam Phamdo, SUNY
Cheng-Chieh Lee, University of Maryland
Rajiv Laroia, AT&T Bell Laboratories (USA)

Volume 1, Page 257

Abstract:

We describe a speech coder based on the intersymbol interference coded quantizer (ICQ). The ICQ is a structured vector quantizer that can realize both boundary and granular gains for sources with memory. It is the quantization dual of the intersymbol interference coder --- a transmission scheme for channels with memory. We have studied two different suboptimal ICQ codebook search algorithms for speech coding and find that the performance of the ICQ based speech coder is very good at rates over 13 kbps but degrades rapidly at lower rates.

300dpi TIFF Images of pages:

257 258 259 260

Acrobat PDF file of whole paper:

ic950257.pdf

TOP

New Techniques for Multi-prototype Waveform Coding at 2.84 kb/s

Authors:

I.S. Burnett, University of Wollongong (AUSTRALIA)
G.J. Bradley, University of Wollongong (AUSTRALIA)

Volume 1, Page 261

Abstract:

This paper describes new techniques for Prototype Waveform (PW) coding at coding rates as low as 2.84kb/s. The algorithm produces good communications quality speech with significant improvements over previously reported PW coding schemes. In Multi-Prototype Waveform (MPW) coding, prototypes are extracted at 2.5ms intervals. No distinction is necessary between voiced and unvoiced speech since the normalised LP residual prototypes are coded as a combination of a noise vector and smoothly evolving pitch pulse. At low bit rates it is unnecessary to explicitly code the underlying pulse shape - the relative magnitude of the rapidly evolving noise vectors is sufficient to describe the level of periodicity in each prototype. Prototypes are quantised using either an open or closed-loop scheme with similar results. Quantised prototypes are interpolated by continuous phase interpolation of Fourier coefficients in the discrete frequency domain.

300dpi TIFF Images of pages:

261 262 263 264

Acrobat PDF file of whole paper:

ic950261.pdf

TOP

Quantization of Non-Linear Predictors in Speech Coding

Authors:

Jes Thyssen, Tele Danmark Research
Henrik Nielsen, Tele Danmark Research
Steffen Duus Hansen, Technical University of Denmark (DENMARK)

Volume 1, Page 265

Abstract:

In this paper we focus on how to exploit the non-linearities in speech with the main purpose of improving the prediction in speech coders. If non-linearities are absent from speech the linear technique is sufficient, but if non-linearities are present the technique is inadequate and more sophisticated predictors are called for. In our ICASSP-94 paper we gave evidence for non-linearities in speech and presented two non-linear short-term predictors that both were superior to the linear predictor without quantization. In this paper we present methods to design vector quantizers for the non-linear predictors and investigate how vector quantization of the non-linear predictors affects prediction. Furthermore, we compare the performance of the quantized non-linear predictors to the performance of traditional quantized linear predictors. The experiments show that 10- bit VQ of the non-linear predictor leads to similar performance as 20-bit state-of-the-art split VQ of the LSP-parameters.

300dpi TIFF Images of pages:

265 266 267 268

Acrobat PDF file of whole paper:

ic950265.pdf

TOP

A Fast Robust Stochastic Algorithm for Vector Quantizer Design for Nonstationary Channels

Authors:

B. Kovesi, ENST/Bretagne (FRANCE)
S. Saoudi, ENST/Bretagne (FRANCE)
J.M. Boucher, ENST/Bretagne (FRANCE)
Z. Reguly, Technical University of Budapest (HUNGARY)

Volume 1, Page 269

Abstract:

In this paper we present the development of the RGSKAe, a new algorithm for designing vector quantizers. The main features of this algorithm are the following: - Due to its stochastic nature it avoids being trapped in poor local minima; - Initial codebook is not needed; the codevectors move away from the gravity centre of the training vectors towards their final position; - Source coding and channel coding are jointly optimized to obtain a codebook robust against different levels of the transmission noise; - The resulted codebook always performs as well or even better than existing codebooks designed for noisy or noiseless channels; - The computational complexity is only slightly higher than that of the most widely used K- means algorithm; - The Bootstrap sampling technique can be successfully applied in case of a large training set; - The method is suitable for parallel implementation.

300dpi TIFF Images of pages:

269 270 271 272

Acrobat PDF file of whole paper:

ic950269.pdf

TOP

Voice Quality of Interconnected PCS Japanese Cellular, and Public Switched Telephone Networks

Authors:

Spiros Dimolitsas, COMSAT Laboratories
Franklin L. Corcoran, COMSAT Laboratories
Channasandra Ravishanker, COMSAT Laboratories
Marion Baraniecki, INTELSAT (USA)

Volume 1, Page 273

Abstract:

Marion Baraniecki INTELSTAT, 3400 International Drive N.W., Washington, DC 20008 (USA) The non-linear nature of low-rate parametric speech coding has made it necessary to resort to formal subjective assessments for quantifying end-to-end voice quality of interconnected networks. At the same time, the rapid growth of cellular communications has highlighted the need to characterize transmission quality when cellular terminals are attached at the access or termination nodes of switched networks. In this paper the voice quality of interconnected North-American and Japanese digital cellular systems over public transmission facilities is quantified. From these assessments it was concluded that cellular networks using 8 kbit/s or 6.4 kbit/s VSELP may meet end-to-end quantization distortion criteria when interconnected with the switched network.

300dpi TIFF Images of pages:

273 274 275 276

Acrobat PDF file of whole paper:

ic950273.pdf

TOP

Objective Speech Measure for Chinese in Wireless Environment

Authors:

K.H. Lam, Hong Kong University of Science & Technology (HONG KONG)
O.C. Au, Hong Kong University of Science & Technology (HONG KONG)
C.C. Chan, Hong Kong University of Science & Technology (HONG KONG)
K.F. Hui, Hong Kong University of Science & Technology (HONG KONG)
S.F. Lau, Hong Kong University of Science & Technology (HONG KONG)

Volume 1, Page 277

Abstract:

Nowadays, cellular phone is becoming an important mobile wireless communication means, especially in metropolitan areas. One of the important operating considerations of cellular phone service providers is to maintain good speech quality of the cellular phone network. Subjective evaluation by repeated listening tests at various sites within the coverage area is impractical due to its intrinsic laborious and expensive nature. As a result, it would be much desirable to have an automatic objective evaluation system which applies a good objective speech measure to estimate the statistical average of subjective opinions of the typical conversational speech sentences sent through the cellular network. While extensive work was done for objective speech measures for languages such as English, Japanese, French, and other western languages, little has been done for Chinese. In addition, little has been done to quantify speech quality in the wireless environment.

300dpi TIFF Images of pages:

277 278 279 280

Acrobat PDF file of whole paper:

ic950277.pdf