Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session SP-21 |
|
SP-21.1
|
Linguistic Mapping in LSF Space for Low-Bit Rate Coding
John J Parry,
Ian S Burnett,
Joe F Chicharo (University of Wollongong)
In this paper we investigate the spectral density of Line Spectral Frequency (LSF) content in languages. The results show that the phonetic variation of languages is reflected in the LSF space. This leads to an alternative approach to the design of LSF quantisers. A trained LSF codebook, like the phonetic inventory of a language, is a static description of spectral behaviour of speech. As clear relationships exist between phonetic segments and LSFs, the structure of an LSF codebook can be analysed in terms of the phonetic segments. The new approach incorporates phonetic information into the structure of LSF codebooks through combining individual phonetic codebooks. The investigation leads to the conclusion that phonetic information can be usefully employed in codebook training in terms of perceptual performance and bit-rate reductions.
|
SP-21.2
|
Predictive Multiple-Scale Lattice VQ for LSF Quantization
Adriana Vasilache (Tampere University of Technology),
Marcel Vasilache (Nokia Research Center),
Ioan Tabus (Tampere University of Technology)
This paper introduces a new lattice quantization scheme, the
multiple-scale lattice vector quantization (MSLVQ), based on the
truncation of the D10+ lattice. The codebook is composed
of several copies of the truncated lattice scaled with different
scaling factors. A fast nearest neighbor search is introduced.
We compare the performance of predictive MSLVQ for quantization of LSF
coefficients with the quantization technique used in the codec G.729
and show the better performance of our method in terms of spectral
distortion. The MSLVQ scheme achieves the transparent quality at 21
bits/frame.
|
SP-21.3
|
A Rootfinding Algorithm for Line Spectral Frequencies
Joseph H Rothweiler (Sanders, A Lockheed Martin Company)
Published techniques for computing line spectral frequencies generally
avoid rootfinding methods because of concerns about
convergence and complexity. However, this paper shows that stable
predictor polynomials have
properties that make rootfinding an attractive approach.
It is well known that the problem of finding the LSF's for an N'th order
predictor polynomial can be reduced to the problem of finding the roots
of a pair of polynomails of order N/2 with real roots. I extend this
result by showing that these polynomials have the following properties:
- It is possible to select starting points for a Newton's rootfinding
method such that the iteration will converge monotonically to the
largest root.
- The Newton iteration can be modified to speed up the process
while still maintaining good convergence properties.
In this paper, I present
the rootfinding procedures with proofs of their good convergence properties.
Finally, I present experimental results showing that this procedure performs
well on speech signals, and that it can be implemented on fixed-point DSP's.
|
SP-21.4
|
Incorporation of Temporal Masking Effects into Bark Spectral Distortion Measure
Bob Novorita (Motorola and University of Illinois - Chicago)
The objective of this paper is to extend a promising objective speech distortion measurement method, the Bark Spectral Distance (BSD) measure, with the auditory concepts of forward and backward temporal masking to improve its measurement accuracy. The results of this investigation show that automatic BSD-based speech quality ratings may be made to correlate better with existing MOS ratings by removing perceptually irrelevant areas of speech from the distance measure. The correlation between the objective BSD measure to the subjective MOS measure increases from 0. 91 to 0. 98. The best results were found with a window duration of 128 samples, use of exponential-slope filter characteristics for both forward and backward masking effects, forward masking delays up to 100 msec, and a backward masking time advance of 40 msec.
|
SP-21.5
|
MVDR BASED ALL-POLE MODELS FOR SPECTRAL CODING OF SPEECH
Manohar N Murthi,
Bhaskar D Rao (Dept. ECE, University of California, San Diego)
We present several analytical properties of Minimum
Variance Distortionless Response (MVDR) based all-pole
models that demonstrate the advantages and usefulness
of these models for speech spectral coding. In
particular, we show that a sufficient order MVDR all-pole
model provides a spectral envelope that fits a set of
spectral samples exactly with a parameterization
convenient for quantization purposes. In addition, we
show that MVDR all-pole filters provide a monotonically
decreasing spectral distortion with increasing filter
order. Furthermore, we show that the MVDR all-pole
filter possesses the flexibility to be obtained from
correlations based upon either spectral samples or
conventional time-domain correlations. Finally,
exploiting the insight gained from MVDR modeling, we
introduce a novel class of constrained all-pole models
for efficient spectral coding. In this approach, a
subset of the Line Spectral Frequency (LSF) parameters
associated with the all-pole model are judiciously
fixed, leading to a simpler model parameterization.
|
SP-21.6
|
Improvement of MBSD by Scaling Noise Masking Threshold and Correlation Analysis with MOS Difference Instead of MOS
Wonho Yang,
Robert Yantorno (Electrical & Computer Engineering Department, College of Engineering, Temple University)
The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously [1][2]. The MBSD measure estimates speech distortion in the loudness domain taking into account the noise masking threshold in order to include only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, the performance of the MBSD is improved by scaling noise masking threshold and comparing it to ITU-T Recommendation P.861 [3] and MNB [4] measures. Correlation analysis with MOS difference instead of MOS has been examined in order to evaluate objective speech quality measures.
|
SP-21.7
|
Performance Bounds for LPC Spectrum Quantization
Per Hedelin (Information Theory, Chalmers University of Technology),
Jan Skoglund,
Jonas Samuelsson (Information Theory, Chalmers University of Technolo)
This paper presents a method for obtaining numerical estimates of high rate
vector quantization (VQ) performance suitable for sources for which
the pdf is not analytically available. In the proposed method, the VQ point
density is described from a Gaussian mixture model optimized for the data.
Employing this method for LPC spectrum quantization, we obtain high rate
expressions for both the average spectral distortion (SD) and the
distribution function of the SD. We estimate the minimum bits required for a
quantizer to obtain an average SD of 1 dB and the outlier statistics for that
quantizer. We find that approximately 3 bits can be saved as
compared to a 2-split LSF-based vector quantizer.
|
SP-21.8
|
Channel Optimized Predictive VQ
Jan Lindén (Chalmers University of Technology)
In this paper combined source-channel coding is considered for the
case of predictive vector quantization. A design algorithm for
channel optimized predictive vector quantizers is proposed. Under
reasonable assumptions, the optimal encoder is presented and a sample
iterative design method that simultaneously optimizes the predictor
and the codebook is derived. We also demonstrate that this design
method can be used to obtain index assignments that are advantageous
to what is obtained by post process index assignment algorithms.
Results are presented for a correlated Gauss-Markov process and for
speech LSF parameters.
|
|