Authors:
John J Parry,
Ian S Burnett,
Joe F Chicharo,
Page (NA) Paper number 1590
Abstract:
In this paper we investigate the spectral density of Line Spectral
Frequency (LSF) content in languages. The results show that the phonetic
variation of languages is reflected in the LSF space. This leads to
an alternative approach to the design of LSF quantisers. A trained
LSF codebook, like the phonetic inventory of a language, is a static
description of spectral behaviour of speech. As clear relationships
exist between phonetic segments and LSFs, the structure of an LSF codebook
can be analysed in terms of the phonetic segments. The new approach
incorporates phonetic information into the structure of LSF codebooks
through combining individual phonetic codebooks. The investigation
leads to the conclusion that phonetic information can be usefully employed
in codebook training in terms of perceptual performance and bit-rate
reductions.
Authors:
Adriana Vasilache,
Marcel Vasilache,
Ioan Tabus,
Page (NA) Paper number 1879
Abstract:
This paper introduces a new lattice quantization scheme, the multiple-scale
lattice vector quantization (MSLVQ), based on the truncation of the
D10+ lattice. The codebook is composed of several copies of the truncated
lattice scaled with different scaling factors. A fast nearest neighbor
search is introduced. We compare the performance of predictive MSLVQ
for quantization of LSF coefficients with the quantization technique
used in the codec G.729 and show the better performance of our method
in terms of spectral distortion. The MSLVQ scheme achieves the transparent
quality at 21 bits/frame.
Authors:
Joseph H Rothweiler,
Page (NA) Paper number 2022
Abstract:
Published techniques for computing line spectral frequencies generally
avoid rootfinding methods because of concerns about convergence and
complexity. However, this paper shows that stable predictor polynomials
have properties that make rootfinding an attractive approach. It is
well known that the problem of finding the LSF's for an N'th order
predictor polynomial can be reduced to the problem of finding the roots
of a pair of polynomails of order N/2 with real roots. I extend this
result by showing that these polynomials have the following properties:
- It is possible to select starting points for a Newton's rootfinding
method such that the iteration will converge monotonically to the largest
root. - The Newton iteration can be modified to speed up the process
while still maintaining good convergence properties. In this paper,
I present the rootfinding procedures with proofs of their good convergence
properties. Finally, I present experimental results showing that this
procedure performs well on speech signals, and that it can be implemented
on fixed-point DSP's.
Authors:
Bob Novorita,
Page (NA) Paper number 2037
Abstract:
The objective of this paper is to extend a promising objective speech
distortion measurement method, the Bark Spectral Distance (BSD) measure,
with the auditory concepts of forward and backward temporal masking
to improve its measurement accuracy. The results of this investigation
show that automatic BSD-based speech quality ratings may be made to
correlate better with existing MOS ratings by removing perceptually
irrelevant areas of speech from the distance measure. The correlation
between the objective BSD measure to the subjective MOS measure increases
from 0. 91 to 0. 98. The best results were found with a window duration
of 128 samples, use of exponential-slope filter characteristics for
both forward and backward masking effects, forward masking delays up
to 100 msec, and a backward masking time advance of 40 msec.
Authors:
Manohar N Murthi,
Bhaskar D Rao,
Page (NA) Paper number 2120
Abstract:
We present several analytical properties of Minimum Variance Distortionless
Response (MVDR) based all-pole models that demonstrate the advantages
and usefulness of these models for speech spectral coding. In particular,
we show that a sufficient order MVDR all-pole model provides a spectral
envelope that fits a set of spectral samples exactly with a parameterization
convenient for quantization purposes. In addition, we show that MVDR
all-pole filters provide a monotonically decreasing spectral distortion
with increasing filter order. Furthermore, we show that the MVDR all-pole
filter possesses the flexibility to be obtained from correlations based
upon either spectral samples or conventional time-domain correlations.
Finally, exploiting the insight gained from MVDR modeling, we introduce
a novel class of constrained all-pole models for efficient spectral
coding. In this approach, a subset of the Line Spectral Frequency (LSF)
parameters associated with the all-pole model are judiciously fixed,
leading to a simpler model parameterization.
Authors:
Wonho Yang,
Robert Yantorno,
Page (NA) Paper number 2178
Abstract:
The Modified Bark Spectral Distortion (MBSD), used for an objective
speech quality measure, was presented previously [1][2]. The MBSD measure
estimates speech distortion in the loudness domain taking into account
the noise masking threshold in order to include only audible distortions
in the calculation of the distortion measure. Preliminary simulation
results have shown improvement of the MBSD over the conventional BSD.
In this paper, the performance of the MBSD is improved by scaling noise
masking threshold and comparing it to ITU-T Recommendation P.861 [3]
and MNB [4] measures. Correlation analysis with MOS difference instead
of MOS has been examined in order to evaluate objective speech quality
measures.
Authors:
Per Hedelin,
Jan Skoglund,
Jonas Samuelsson,
Page (NA) Paper number 2332
Abstract:
This paper presents a method for obtaining numerical estimates of high
rate vector quantization (VQ) performance suitable for sources for
which the pdf is not analytically available. In the proposed method,
the VQ point density is described from a Gaussian mixture model optimized
for the data. Employing this method for LPC spectrum quantization,
we obtain high rate expressions for both the average spectral distortion
(SD) and the distribution function of the SD. We estimate the minimum
bits required for a quantizer to obtain an average SD of 1 dB and the
outlier statistics for that quantizer. We find that approximately 3
bits can be saved as compared to a 2-split LSF-based vector quantizer.
Authors:
Jan Lindén,
Page (NA) Paper number 2417
Abstract:
In this paper combined source-channel coding is considered for the
case of predictive vector quantization. A design algorithm for channel
optimized predictive vector quantizers is proposed. Under reasonable
assumptions, the optimal encoder is presented and a sample iterative
design method that simultaneously optimizes the predictor and the codebook
is derived. We also demonstrate that this design method can be used
to obtain index assignments that are advantageous to what is obtained
by post process index assignment algorithms. Results are presented
for a correlated Gauss-Markov process and for speech LSF parameters.
|