ICASSP99 Topics in Speech Coding

Topics in Speech Coding
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Performance Assessment Of Tandem Connection Of Enhanced Cellular Coders Authors: Simao F Campos Neto, Franklin L Corcoran, Ara Karahisar, Page (NA) Paper number 1313 Abstract: The growth and increased competition in the second-generation (digital) cellular communication market has led service providers to improve the speech quality in their systems by introducing enhanced speech coders. Advancements in speech coding allowed designers to aim at toll-quality for these enhanced coders, and investigation of the impact of speech coders on the end-to-end quality of the public switched telephone network (PSTN) is necessary. This paper presents the continuation of a series of studies on the impact of tandem connection of cellular systems, where the quality of the enhanced cellular coders for major systems in use today is studied in the context of PSTN interconnection. A major conclusion of this study is that deployment of enhanced coders in second-generation cellular systems makes possible a substantial increase in quality of the cellular connections when in tandem with other speech coders in long haul international networks. IC991313.PDF (From Author) IC991313.PDF (Rasterized) TOP TTS Based Very Low Bit Rate Speech Coder Authors: Ki-Seung Lee, Richard V. Cox, Page (NA) Paper number 1457 Abstract: This paper addresses a speech coder which uses a Text-To-Speech (TTS) synthesis system to achieve very low bit rates (sub 1kbps). The main issue of the work is the accurate coding of the pitch(F0) and gain contours which are principle components of prosody. This is of paramount interest since the correct prosody will increase naturalness and an efficient coding scheme will provide high coding gain. Together with the phonetic transcription, the F0 and gain contour constitute the parameters that are necessary for the TTS system to synthesize the speech signal. Piecewise linear approximation is used to code the F0 parameter. A technique which minimizes bit rate while maintaining F0 error below a given threshold are described. To obtain both high compression and smoothly changing gain contours, the variance of the signal is averaged over each half phoneme length is transmitted as gain information. With single speaker stimuli, and a priori text transcription information, we obtained naturally sounding speech at an average bit rate of about 300 bps. IC991457.PDF (From Author) IC991457.PDF (Rasterized) TOP Wideband Speech Coding With Toll Quality Based On IA-Model Authors: Ling Kok Ng, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore) Gang Li, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore) Xiao Lin, Center for Signal Processing, Nanyang Technological University, Singapore 639798 (Singapore) Guoan Bi, School of EEE, Nanyang Technological University, Singapore 639798 (Singapore) Page (NA) Paper number 1516 Abstract: In this paper, we propose an instantaneous amplitude (IA) based model for speech signal representation. This can avoid the difficulty in dealing with the time-varying phases and allows us to perform an optimization procedure easily such that the synthetic signal can be made as close to the original one as possible. A simplified frequency-picking algorithm is derived to shorten the processing time while still maintaining the quality of the synthetic speech. Experiments show that the synthetic speech with the developed technique is of toll quality and almost perceptually indistinguishable from the original speech. Initiate work on the coding of the parameters, for a 16kHz sampled speech, for the IA model is done and a toll quality synthesized speech at a bit rate of 40kbps is achieved. IC991516.PDF (From Author) IC991516.PDF (Rasterized) TOP 4 kb/s Multi-Pulse Based CELP Speech Coding Using Excitation Switching Authors: Kazunori Ozawa, Page (NA) Paper number 1656 Abstract: This paper proposes an MP-CELP (Multi-Pulse-based CELP) speech coding at 4 kb/s. In MP-CELP, amplitudes or signs of multi-pulse excitation are simultaneously vector quantized (VQ). In order to improve speech quality for background noise conditions, excitation signal is switched between voiced and unvoiced speech, and the number of pulse is greatly increased for unvoiced speech by restricting pulse locations. Further, in order to improve voiced speech quality, the optimal combination among adaptive codebook lag, pulse location, sign codevector and gain codevector is selected which minimizes distortion by employing delayed-decision search. The subjective evaluation results show that speech quality for 4 kb/s MP-CELP is close to that for ITU-T G.723.1 (6.3 kb/s) and G.729 (8 kb/s) in M-IRS clean speech condition. For background noise conditions, the introduction for the excitation switching and the pulse location restriction significantly improves MOS value by 0.4. However, further improvement is still required, except for interference talker condition. IC991656.PDF (From Author) IC991656.PDF (Rasterized) TOP An Adaptive Multi-Rate Speech Coder For Digital Cellular Telephony Authors: Erdal Paksoy, Juan Carlos De Martin, Alan V McCree, Christian G Gerlach, Anand Anandakumar, Wai-Ming Lai, Vishu R Viswanathan, Page (NA) Paper number 1986 Abstract: We have developed an adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate (22.8 kb/s) and half rate (11.4 kb/s) channels and to maintain high quality in the presence of highly varying background noise and channel conditions. Within each total rate, several codec modes with different source/channel bit rate allocations are used. The speech coders in each codec mode are based on the CELP algorithm operating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where the lowest rate coder is a source controlled multi-modal speech coder. The decoders monitor channel quality at both ends of the wireless link using the soft values for the received bits and assist the base station in selecting the codec mode that is appropriate for a given channel condition. The coder was submitted to the GSM AMR standardization competition and met the qualification requirements in an independent formal MOS test. IC991986.PDF (From Author) IC991986.PDF (Rasterized) TOP An Adaptive Post-Filtering Technique Based on The Modified Yule-Walker Filter Authors: Azhar Mustapha, COMSAT Laboratories, Clarksburg, Maryland, USA (USA) Suat Yeldener, COMSAT Laboratories, Clarksburg, Maryland, USA (USA) Page (NA) Paper number 2024 Abstract: This paper presents an adaptive time-domain post-filtering technique based on the modified Yule-Walker filter. Conventionally, post-filtering is derived from an original LPC spectrum. In general, this time-domain technique produces unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse and high pass filtering and causes unnecessary attenuation or amplification of some frequency components that introduces muffling in speech quality. This effect increases when voice coders are tandemed together. Another approach of designing a post-filter was developed by McAulay and Quatieri which can only be used in sinusoidal based speech coders. We have also developed another new time-domain post-filtering technique. This technique eliminates the problem of spectral tilt in speech spectrum that can be applied to various speech coders. The new post-filter has a flat frequency response at the formant peaks of speech spectrum. Instead of looking at the modified LPC synthesis, inverse, and high pass filtering in the conventional time-domain technique, we gather information about the poles of the LPC spectrum in the new technique. This post-filtering technique has been used in a 4 kb/s Harmonic Excitation Linear Predictive Coder (HE-LPC) and a subjective listening tests have indicated that this technique outperforms the conventional one in both one and two tandem connections. IC992024.PDF (From Author) IC992024.PDF (Rasterized) TOP A Modular Approach to Speech Enhancement with an Application to Speech Coding Authors: Anthony J Accardi, Richard V. Cox, Page (NA) Paper number 2099 Abstract: Ephraim and Malah's MMSE-LSA speech enhancement algorithm, while robust and effective, is difficult to tune and adjust for the tradeoff between noise reduction and distortion. We suggest a means of generalizing this design, which allows for other estimators besides the MMSE-LSA to be used within the same supporting framework. When a modified version of Ephraim and Van Trees's spectral domain constrained signal subspace estimator is used in this manner, we obtain a system with greater flexibility and similar performance. We also explore the possibility of using different speech enhancement techniques as pre-processors for different parameter extraction modules of the IS-641 speech coder. We show that such a strategy can increase the quality of the coded speech and lead to a system that is more robust to differing noise types. IC992099.PDF (From Author) IC992099.PDF (Rasterized) TOP On Speech Coding in a Perceptual Domain Authors: Gernot Kubin, W. Bastiaan Kleijn, Page (NA) Paper number 2327 Abstract: In many speech coders, the distortion criterion operates on the speech signal or a signal obtained by adaptive linear filtering of the speech signal. To satisfy computational and delay constraints, the distortion criterion must be reduced to a very simple approximation of the auditory system. This drawback of conventional approaches motivates a new speech coding paradigm in which the coding is performed in a domain where the single-letter squared-error criterion forms an accurate representation of perception. The new paradigm requires a model of the auditory periphery which is accurate, can be be inverted with relatively low computational effort, and which represents the signal with relatively few parameters. In this paper we develop such a model of the auditory periphery and discuss its suitability for speech coding. Our results indicate that the new paradigm in general and our auditory model in particular form a promising basis for speech and audio coding. IC992327.PDF (From Author) IC992327.PDF (Rasterized) TOP