SP-21.1

Linguistic Mapping in LSF Space for Low-Bit Rate Coding
John J Parry, Ian S Burnett, Joe F Chicharo (University of Wollongong)

In this paper we investigate the spectral density of Line Spectral Frequency (LSF) content in languages. The results show that the phonetic variation of languages is reflected in the LSF space. This leads to an alternative approach to the design of LSF quantisers. A trained LSF codebook, like the phonetic inventory of a language, is a static description of spectral behaviour of speech. As clear relationships exist between phonetic segments and LSFs, the structure of an LSF codebook can be analysed in terms of the phonetic segments. The new approach incorporates phonetic information into the structure of LSF codebooks through combining individual phonetic codebooks. The investigation leads to the conclusion that phonetic information can be usefully employed in codebook training in terms of perceptual performance and bit-rate reductions.

SP-21.2

Predictive Multiple-Scale Lattice VQ for LSF Quantization
Adriana Vasilache (Tampere University of Technology), Marcel Vasilache (Nokia Research Center), Ioan Tabus (Tampere University of Technology)

This paper introduces a new lattice quantization scheme, the multiple-scale lattice vector quantization (MSLVQ), based on the truncation of the D10+ lattice. The codebook is composed of several copies of the truncated lattice scaled with different scaling factors. A fast nearest neighbor search is introduced. We compare the performance of predictive MSLVQ for quantization of LSF coefficients with the quantization technique used in the codec G.729 and show the better performance of our method in terms of spectral distortion. The MSLVQ scheme achieves the transparent quality at 21 bits/frame.

SP-21.3

A Rootfinding Algorithm for Line Spectral Frequencies
Joseph H Rothweiler (Sanders, A Lockheed Martin Company)

Published techniques for computing line spectral frequencies generally avoid rootfinding methods because of concerns about convergence and complexity. However, this paper shows that stable predictor polynomials have properties that make rootfinding an attractive approach. It is well known that the problem of finding the LSF's for an N'th order predictor polynomial can be reduced to the problem of finding the roots of a pair of polynomails of order N/2 with real roots. I extend this result by showing that these polynomials have the following properties: - It is possible to select starting points for a Newton's rootfinding method such that the iteration will converge monotonically to the largest root. - The Newton iteration can be modified to speed up the process while still maintaining good convergence properties. In this paper, I present the rootfinding procedures with proofs of their good convergence properties. Finally, I present experimental results showing that this procedure performs well on speech signals, and that it can be implemented on fixed-point DSP's.

SP-21.4

Incorporation of Temporal Masking Effects into Bark Spectral Distortion Measure
Bob Novorita (Motorola and University of Illinois - Chicago)

The objective of this paper is to extend a promising objective speech distortion measurement method, the Bark Spectral Distance (BSD) measure, with the auditory concepts of forward and backward temporal masking to improve its measurement accuracy. The results of this investigation show that automatic BSD-based speech quality ratings may be made to correlate better with existing MOS ratings by removing perceptually irrelevant areas of speech from the distance measure. The correlation between the objective BSD measure to the subjective MOS measure increases from 0. 91 to 0. 98. The best results were found with a window duration of 128 samples, use of exponential-slope filter characteristics for both forward and backward masking effects, forward masking delays up to 100 msec, and a backward masking time advance of 40 msec.

SP-21.5

MVDR BASED ALL-POLE MODELS FOR SPECTRAL CODING OF SPEECH
Manohar N Murthi, Bhaskar D Rao (Dept. ECE, University of California, San Diego)

We present several analytical properties of Minimum Variance Distortionless Response (MVDR) based all-pole models that demonstrate the advantages and usefulness of these models for speech spectral coding. In particular, we show that a sufficient order MVDR all-pole model provides a spectral envelope that fits a set of spectral samples exactly with a parameterization convenient for quantization purposes. In addition, we show that MVDR all-pole filters provide a monotonically decreasing spectral distortion with increasing filter order. Furthermore, we show that the MVDR all-pole filter possesses the flexibility to be obtained from correlations based upon either spectral samples or conventional time-domain correlations. Finally, exploiting the insight gained from MVDR modeling, we introduce a novel class of constrained all-pole models for efficient spectral coding. In this approach, a subset of the Line Spectral Frequency (LSF) parameters associated with the all-pole model are judiciously fixed, leading to a simpler model parameterization.

SP-21.6

Improvement of MBSD by Scaling Noise Masking Threshold and Correlation Analysis with MOS Difference Instead of MOS
Wonho Yang, Robert Yantorno (Electrical & Computer Engineering Department, College of Engineering, Temple University)

The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously [1][2]. The MBSD measure estimates speech distortion in the loudness domain taking into account the noise masking threshold in order to include only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, the performance of the MBSD is improved by scaling noise masking threshold and comparing it to ITU-T Recommendation P.861 [3] and MNB [4] measures. Correlation analysis with MOS difference instead of MOS has been examined in order to evaluate objective speech quality measures.

SP-21.7

Performance Bounds for LPC Spectrum Quantization
Per Hedelin (Information Theory, Chalmers University of Technology), Jan Skoglund, Jonas Samuelsson (Information Theory, Chalmers University of Technolo)

This paper presents a method for obtaining numerical estimates of high rate vector quantization (VQ) performance suitable for sources for which the pdf is not analytically available. In the proposed method, the VQ point density is described from a Gaussian mixture model optimized for the data. Employing this method for LPC spectrum quantization, we obtain high rate expressions for both the average spectral distortion (SD) and the distribution function of the SD. We estimate the minimum bits required for a quantizer to obtain an average SD of 1 dB and the outlier statistics for that quantizer. We find that approximately 3 bits can be saved as compared to a 2-split LSF-based vector quantizer.

SP-21.8

Channel Optimized Predictive VQ
Jan Lind�n (Chalmers University of Technology)

In this paper combined source-channel coding is considered for the case of predictive vector quantization. A design algorithm for channel optimized predictive vector quantizers is proposed. Under reasonable assumptions, the optimal encoder is presented and a sample iterative design method that simultaneously optimizes the predictor and the codebook is derived. We also demonstrate that this design method can be used to obtain index assignments that are advantageous to what is obtained by post process index assignment algorithms. Results are presented for a correlated Gauss-Markov process and for speech LSF parameters.

< SP-20 SP-22 >

Last Update: February 4, 1999 Ingo Höntsch