ICASSP99 Speech Analysis and Quantization

Speech Analysis and Quantization
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Linguistic Mapping in LSF Space for Low-Bit Rate Coding Authors: John J Parry, Ian S Burnett, Joe F Chicharo, Page (NA) Paper number 1590 Abstract: In this paper we investigate the spectral density of Line Spectral Frequency (LSF) content in languages. The results show that the phonetic variation of languages is reflected in the LSF space. This leads to an alternative approach to the design of LSF quantisers. A trained LSF codebook, like the phonetic inventory of a language, is a static description of spectral behaviour of speech. As clear relationships exist between phonetic segments and LSFs, the structure of an LSF codebook can be analysed in terms of the phonetic segments. The new approach incorporates phonetic information into the structure of LSF codebooks through combining individual phonetic codebooks. The investigation leads to the conclusion that phonetic information can be usefully employed in codebook training in terms of perceptual performance and bit-rate reductions. IC991590.PDF (From Author) IC991590.PDF (Rasterized) TOP Predictive Multiple-Scale Lattice VQ for LSF Quantization Authors: Adriana Vasilache, Marcel Vasilache, Ioan Tabus, Page (NA) Paper number 1879 Abstract: This paper introduces a new lattice quantization scheme, the multiple-scale lattice vector quantization (MSLVQ), based on the truncation of the D10+ lattice. The codebook is composed of several copies of the truncated lattice scaled with different scaling factors. A fast nearest neighbor search is introduced. We compare the performance of predictive MSLVQ for quantization of LSF coefficients with the quantization technique used in the codec G.729 and show the better performance of our method in terms of spectral distortion. The MSLVQ scheme achieves the transparent quality at 21 bits/frame. IC991879.PDF (From Author) IC991879.PDF (Rasterized) TOP A Rootfinding Algorithm for Line Spectral Frequencies Authors: Joseph H Rothweiler, Page (NA) Paper number 2022 Abstract: Published techniques for computing line spectral frequencies generally avoid rootfinding methods because of concerns about convergence and complexity. However, this paper shows that stable predictor polynomials have properties that make rootfinding an attractive approach. It is well known that the problem of finding the LSF's for an N'th order predictor polynomial can be reduced to the problem of finding the roots of a pair of polynomails of order N/2 with real roots. I extend this result by showing that these polynomials have the following properties: - It is possible to select starting points for a Newton's rootfinding method such that the iteration will converge monotonically to the largest root. - The Newton iteration can be modified to speed up the process while still maintaining good convergence properties. In this paper, I present the rootfinding procedures with proofs of their good convergence properties. Finally, I present experimental results showing that this procedure performs well on speech signals, and that it can be implemented on fixed-point DSP's. IC992022.PDF (From Author) IC992022.PDF (Rasterized) TOP Incorporation of Temporal Masking Effects into Bark Spectral Distortion Measure Authors: Bob Novorita, Page (NA) Paper number 2037 Abstract: The objective of this paper is to extend a promising objective speech distortion measurement method, the Bark Spectral Distance (BSD) measure, with the auditory concepts of forward and backward temporal masking to improve its measurement accuracy. The results of this investigation show that automatic BSD-based speech quality ratings may be made to correlate better with existing MOS ratings by removing perceptually irrelevant areas of speech from the distance measure. The correlation between the objective BSD measure to the subjective MOS measure increases from 0. 91 to 0. 98. The best results were found with a window duration of 128 samples, use of exponential-slope filter characteristics for both forward and backward masking effects, forward masking delays up to 100 msec, and a backward masking time advance of 40 msec. IC992037.PDF (From Author) IC992037.PDF (Rasterized) TOP MVDR Based All-Pole Models For Spectral Coding Of Speech Authors: Manohar N Murthi, Bhaskar D Rao, Page (NA) Paper number 2120 Abstract: We present several analytical properties of Minimum Variance Distortionless Response (MVDR) based all-pole models that demonstrate the advantages and usefulness of these models for speech spectral coding. In particular, we show that a sufficient order MVDR all-pole model provides a spectral envelope that fits a set of spectral samples exactly with a parameterization convenient for quantization purposes. In addition, we show that MVDR all-pole filters provide a monotonically decreasing spectral distortion with increasing filter order. Furthermore, we show that the MVDR all-pole filter possesses the flexibility to be obtained from correlations based upon either spectral samples or conventional time-domain correlations. Finally, exploiting the insight gained from MVDR modeling, we introduce a novel class of constrained all-pole models for efficient spectral coding. In this approach, a subset of the Line Spectral Frequency (LSF) parameters associated with the all-pole model are judiciously fixed, leading to a simpler model parameterization. IC992120.PDF (From Author) IC992120.PDF (Rasterized) TOP Improvement of MBSD by Scaling Noise Masking Threshold and Correlation Analysis with MOS Difference Instead of MOS Authors: Wonho Yang, Robert Yantorno, Page (NA) Paper number 2178 Abstract: The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously [1][2]. The MBSD measure estimates speech distortion in the loudness domain taking into account the noise masking threshold in order to include only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, the performance of the MBSD is improved by scaling noise masking threshold and comparing it to ITU-T Recommendation P.861 [3] and MNB [4] measures. Correlation analysis with MOS difference instead of MOS has been examined in order to evaluate objective speech quality measures. IC992178.PDF (From Author) IC992178.PDF (Rasterized) TOP Performance Bounds for LPC Spectrum Quantization Authors: Per Hedelin, Jan Skoglund, Jonas Samuelsson, Page (NA) Paper number 2332 Abstract: This paper presents a method for obtaining numerical estimates of high rate vector quantization (VQ) performance suitable for sources for which the pdf is not analytically available. In the proposed method, the VQ point density is described from a Gaussian mixture model optimized for the data. Employing this method for LPC spectrum quantization, we obtain high rate expressions for both the average spectral distortion (SD) and the distribution function of the SD. We estimate the minimum bits required for a quantizer to obtain an average SD of 1 dB and the outlier statistics for that quantizer. We find that approximately 3 bits can be saved as compared to a 2-split LSF-based vector quantizer. IC992332.PDF (From Author) IC992332.PDF (Rasterized) TOP Channel Optimized Predictive VQ Authors: Jan Lindén, Page (NA) Paper number 2417 Abstract: In this paper combined source-channel coding is considered for the case of predictive vector quantization. A design algorithm for channel optimized predictive vector quantizers is proposed. Under reasonable assumptions, the optimal encoder is presented and a sample iterative design method that simultaneously optimizes the predictor and the codebook is derived. We also demonstrate that this design method can be used to obtain index assignments that are advantageous to what is obtained by post process index assignment algorithms. Results are presented for a correlated Gauss-Markov process and for speech LSF parameters. IC992417.PDF (From Author) IC992417.PDF (Rasterized) TOP