Speech Analysis and Quantization

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Linguistic Mapping in LSF Space for Low-Bit Rate Coding

Authors:

John J Parry,
Ian S Burnett,
Joe F Chicharo,

Page (NA) Paper number 1590

Abstract:

In this paper we investigate the spectral density of Line Spectral Frequency (LSF) content in languages. The results show that the phonetic variation of languages is reflected in the LSF space. This leads to an alternative approach to the design of LSF quantisers. A trained LSF codebook, like the phonetic inventory of a language, is a static description of spectral behaviour of speech. As clear relationships exist between phonetic segments and LSFs, the structure of an LSF codebook can be analysed in terms of the phonetic segments. The new approach incorporates phonetic information into the structure of LSF codebooks through combining individual phonetic codebooks. The investigation leads to the conclusion that phonetic information can be usefully employed in codebook training in terms of perceptual performance and bit-rate reductions.

IC991590.PDF (From Author) IC991590.PDF (Rasterized)

TOP


Predictive Multiple-Scale Lattice VQ for LSF Quantization

Authors:

Adriana Vasilache,
Marcel Vasilache,
Ioan Tabus,

Page (NA) Paper number 1879

Abstract:

This paper introduces a new lattice quantization scheme, the multiple-scale lattice vector quantization (MSLVQ), based on the truncation of the D10+ lattice. The codebook is composed of several copies of the truncated lattice scaled with different scaling factors. A fast nearest neighbor search is introduced. We compare the performance of predictive MSLVQ for quantization of LSF coefficients with the quantization technique used in the codec G.729 and show the better performance of our method in terms of spectral distortion. The MSLVQ scheme achieves the transparent quality at 21 bits/frame.

IC991879.PDF (From Author) IC991879.PDF (Rasterized)

TOP


A Rootfinding Algorithm for Line Spectral Frequencies

Authors:

Joseph H Rothweiler,

Page (NA) Paper number 2022

Abstract:

Published techniques for computing line spectral frequencies generally avoid rootfinding methods because of concerns about convergence and complexity. However, this paper shows that stable predictor polynomials have properties that make rootfinding an attractive approach. It is well known that the problem of finding the LSF's for an N'th order predictor polynomial can be reduced to the problem of finding the roots of a pair of polynomails of order N/2 with real roots. I extend this result by showing that these polynomials have the following properties: - It is possible to select starting points for a Newton's rootfinding method such that the iteration will converge monotonically to the largest root. - The Newton iteration can be modified to speed up the process while still maintaining good convergence properties. In this paper, I present the rootfinding procedures with proofs of their good convergence properties. Finally, I present experimental results showing that this procedure performs well on speech signals, and that it can be implemented on fixed-point DSP's.

IC992022.PDF (From Author) IC992022.PDF (Rasterized)

TOP


Incorporation of Temporal Masking Effects into Bark Spectral Distortion Measure

Authors:

Bob Novorita,

Page (NA) Paper number 2037

Abstract:

The objective of this paper is to extend a promising objective speech distortion measurement method, the Bark Spectral Distance (BSD) measure, with the auditory concepts of forward and backward temporal masking to improve its measurement accuracy. The results of this investigation show that automatic BSD-based speech quality ratings may be made to correlate better with existing MOS ratings by removing perceptually irrelevant areas of speech from the distance measure. The correlation between the objective BSD measure to the subjective MOS measure increases from 0. 91 to 0. 98. The best results were found with a window duration of 128 samples, use of exponential-slope filter characteristics for both forward and backward masking effects, forward masking delays up to 100 msec, and a backward masking time advance of 40 msec.

IC992037.PDF (From Author) IC992037.PDF (Rasterized)

TOP


MVDR Based All-Pole Models For Spectral Coding Of Speech

Authors:

Manohar N Murthi,
Bhaskar D Rao,

Page (NA) Paper number 2120

Abstract:

We present several analytical properties of Minimum Variance Distortionless Response (MVDR) based all-pole models that demonstrate the advantages and usefulness of these models for speech spectral coding. In particular, we show that a sufficient order MVDR all-pole model provides a spectral envelope that fits a set of spectral samples exactly with a parameterization convenient for quantization purposes. In addition, we show that MVDR all-pole filters provide a monotonically decreasing spectral distortion with increasing filter order. Furthermore, we show that the MVDR all-pole filter possesses the flexibility to be obtained from correlations based upon either spectral samples or conventional time-domain correlations. Finally, exploiting the insight gained from MVDR modeling, we introduce a novel class of constrained all-pole models for efficient spectral coding. In this approach, a subset of the Line Spectral Frequency (LSF) parameters associated with the all-pole model are judiciously fixed, leading to a simpler model parameterization.

IC992120.PDF (From Author) IC992120.PDF (Rasterized)

TOP


Improvement of MBSD by Scaling Noise Masking Threshold and Correlation Analysis with MOS Difference Instead of MOS

Authors:

Wonho Yang,
Robert Yantorno,

Page (NA) Paper number 2178

Abstract:

The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously [1][2]. The MBSD measure estimates speech distortion in the loudness domain taking into account the noise masking threshold in order to include only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, the performance of the MBSD is improved by scaling noise masking threshold and comparing it to ITU-T Recommendation P.861 [3] and MNB [4] measures. Correlation analysis with MOS difference instead of MOS has been examined in order to evaluate objective speech quality measures.

IC992178.PDF (From Author) IC992178.PDF (Rasterized)

TOP


Performance Bounds for LPC Spectrum Quantization

Authors:

Per Hedelin,
Jan Skoglund,
Jonas Samuelsson,

Page (NA) Paper number 2332

Abstract:

This paper presents a method for obtaining numerical estimates of high rate vector quantization (VQ) performance suitable for sources for which the pdf is not analytically available. In the proposed method, the VQ point density is described from a Gaussian mixture model optimized for the data. Employing this method for LPC spectrum quantization, we obtain high rate expressions for both the average spectral distortion (SD) and the distribution function of the SD. We estimate the minimum bits required for a quantizer to obtain an average SD of 1 dB and the outlier statistics for that quantizer. We find that approximately 3 bits can be saved as compared to a 2-split LSF-based vector quantizer.

IC992332.PDF (From Author) IC992332.PDF (Rasterized)

TOP


Channel Optimized Predictive VQ

Authors:

Jan Lindén,

Page (NA) Paper number 2417

Abstract:

In this paper combined source-channel coding is considered for the case of predictive vector quantization. A design algorithm for channel optimized predictive vector quantizers is proposed. Under reasonable assumptions, the optimal encoder is presented and a sample iterative design method that simultaneously optimizes the predictor and the codebook is derived. We also demonstrate that this design method can be used to obtain index assignments that are advantageous to what is obtained by post process index assignment algorithms. Results are presented for a correlated Gauss-Markov process and for speech LSF parameters.

IC992417.PDF (From Author) IC992417.PDF (Rasterized)

TOP