Chair: Costas Xydeas, University of Manchester (UK)
Minjie Xie, University of Sherbrooke (CANADA)
Jean-Pierre Adoul, University of Sherbrooke (CANADA)
This paper presents an algebraic vector-quantization scheme for encoding the LSF parameters used in describing the time-varying short-term spectrum of speech in many modern vocoders. The quantizer achieves an average spectral distortion of 1 dB at 28 bits/frame for the telephone bandwidth. The scheme is based on low-dimensionality regular-point lattices. Properties of lattices are taken advantage of in both the design and the search of the quantizer codebook. Namely, this algebraic codebook need not be stored in memory and the optimum vector is found through simple rounding of the input variables instead of the usual exhaustive search. Thus, the scheme results in significant savings of memory and reduced computational complexity when compared to traditional vector-quantizer solutions.
H.R. Sadegh Mohammadi, University of New South Wales (AUSTRALIA)
W.H. Holmes, University of New South Wales (AUSTRALIA)
In low rate speech coders based on the linear prediction method, the quality of synthesized speech can be improved by enhancement of the short-term spectrum quantization stage. In this study, we propose two new efficient methods for coding the spectral parameters, namely sorted codebook vector quantization (SCVQ) and fine-coarse vector quantization (FCVQ). The principles of these methods are presented along with the methods of training and optimizing the related codebooks. The performance of the new schemes is compared experimentally with other efficient methods, such as tree-searched vector quantization (TSVQ) and multi-stage vector quantization (MSVQ). We demonstrate that the new methods offer significant cost reduction whilst achieving superior quality.
Stefan Bruhn, Technical University of Berlin (GERMANY)
Efficient block coding methods for LPC information play an essential role in very-low-rate speech coding systems. The subject of this contribution is a new suboptimal matrix quantization scheme for LPC parameters, called matrix product quantization (MPQ), which operates at bit rates between 300 and 700 b/s. MPQ encodes sequences of LPC parameter vectors using a product formulation of two matrices which describe the average parameter vector and the temporal contour. In fixed-rate coding systems for mobile communication, MPQ achieves a very high coding efficiency at a low coding delay. Compared to the multi-frame coding method (MFC) of Kemp et al., which causes a delay of 8 frames, the MPQ scheme operates more efficiently even at a coding delay of only 3 frames. Applying MPQ to a variable-rate segment vocoder, a bit rate reduction of 50% compared to memoryless VQ is obtained at a frame period of 20 ms.
A. Goalic, ENST-Bretagne (FRANCE)
S. Saoudi, ENST-Bretagne (FRANCE)
The Code Excited Linear Prediction Coder (CELP) makes it possible to synthesize good quality speech at low bit rates. In such a case, speech quality mainly depends on spectral envelope design accuracy. Different kinds of parameters belonging to the parametrical domain , to the time domain and to the frequency domain (LSP parameters) are used to design the vocal tract transfer function. The latter present interesting properties both for quantization and interpolation, confirming their increasing utilization in low bit rate speech coding. In real time processing, efficient methods must be used to compute this set of parameters. The purpose of this paper is to show attractive properties in real time processing for Line Spectrum Pair computing (LSP). The possibility of computing each LSP parameter independently characterizes the intrinsic reliability of this method based on the use of the Split Levinson Algorithm. This method is compared to the one using the Chebyshev polynomials.
H. Petter Knagenhjelm, AT&T Bell Laboratories (USA)
W. Bastiaan Kleijn, AT&T Bell Laboratories (USA)
Linear prediction coefficients are used to describe the power-spectrum envelope in the majority of low-bit-rate coders. The performance of quantizers for the linear-prediction coefficients is generally evaluated in terms of spectral distortion. This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power- spectrum envelope than of the spectral distortion as usually evaluated. Smoothing the evolution of the power-spectrum envelope over time increases the reconstructed speech quality. A reasonable objective is to find the smoothest path that keeps the quantized parameters within the Voronoi regions associated with the transmitted quantization index. We demonstrate increased quantizer performance by such smoothing of the line-spectral frequencies.
Dong-il Chang, Seoul National University (KOREA)
Young-kwon Cho, Seoul National University (KOREA)
Souguil Ann, Seoul National University (KOREA)
In this paper, we propose a classified SVQ of line spectral frequency (LSF) parameters combined with conditional splitting. The propose algorithm adopts an independent conditional splitting scheme instead of the conventional fixed splitting scheme for each class. Considering the perceptual and spectral sensitivity characteristics of LSF's, we define an LSF perceptual importance index (LPII) to represent the relative perceptual importance of each one. Experimental results have shown that the proposed algorithm, conditional split VQ (CONSVQ), can achieve reduction of 37.5% in searching complexity while maintaining the performance of quantization. From these results, we have found that the performance of VQ can be enhanced by considering and using the difference in relative importance of LSF's.
C.S. Xydeas, University of Manchester (UK)
C. Papanastasiou, University of Manchester (UK)
This paper presents a new and efficient LPC quantisation scheme called Split Matrix Quantisation (SMQ). The proposed method can be viewed as an extension of the conventional Split Vector Quantisation process. It operates over N consecutive LPC frames and effectively divides a p by N LSP matrix into K submatrices which are then vector quantised independently. SMO exploits the interframe redundancy that exists between consecutive sets of LSP coefficients and achieves "transparent" quantisation at 900bits/sec. "High quality" LSP quantisation can be easily obtained at 750bits/sec. These bit rates are based in a 20 msec LPC analysis frame size. Furthermore, SMQ is characterised by relatively low complexity and low storage requirements.
John S. Collura, U.S. Department of Defense (USA)
Thomas E. Tremain, U.S. Department of Defense (USA)
A growing number of state of the art speech coding algorithms use vector quantization (VQ) to quantize spectrum information. VQ code books are created from a set of training vectors which are drawn from and representative of the overall data being quantized. These training vectors are partitioned into a set of clusters whose centroids represent the region of the partition and are called code vectors. Of specific interest to this paper is the ratio, beta, of the number of training vectors to the number of code vectors. The goal of this paper is to provide guidance on appropriate levels of training data regardless of code book size. Of particular significance is the empirical determination of a minimum beta value of 128 training vectors per code vector for full vector code books.
Roar Hagen, Chalmers University of Technology
Erdal Paksoy, University of California (USA)
Allen Gersho, University of California (USA)
Variable rate quantization of the linear predictive coding (LPC) parameters based on phonetic classification of the speech frame results in substantial performance gain. Speech frames are classified as unvoiced or voiced and are separately quantized with VQ codebooks designed for each class. Performance results, including listening tests, show that for transparent quality roughly 9 bits is sufficient for unvoiced frames and 24 bits for voiced frames. Test results of LPC quantization are described for a variable rate phonetically segmented CELP coder and for the synthesis of speech from the prediction residual.
William R. Gardner, University of California-San Diego
Bhaskar D. Rao, QualComm Inc. (USA)
This paper presents a class of quadratically weighted distortion measures which provide optimal performance for the high rate vector quantization (VQ) of linear predictive coding (LPC) parameters. It is shown that the quantization distortion of a high rate VQ converges to a quadratically weighted measure, where the quadratic weighting matrix is a "sensitivity" matrix, which is a generalization of the scalar sensitivity concept to the vector case. The sensitivity matrix is the second order term of the Taylor series expansion of the original distortion measure. Closed form expressions and computationally efficient methods for computing the sensitivity matrices of the different LPC parameterizations are given, which involve no numerical integration and can be implemented in real-time on modern DSP chips. In the general case, the "sum of sensitivity weighted scalar errors" is not equivalent to the original distortion measure. However, the sensitivity matrix of the line spectral pair (LSP) frequencies is exactly diagonal, demonstrating that for LSP's only a "sum of sensitivity weighted scalar errors" will result in optimal performance.