ICASSP99 Speech and Audio Technologies

Speech and Audio Technologies
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	A DSP Powered Solid State Audio System Authors: Jason D Kridner, Mark T Nadeski, Pedro R Gelabert, Page (NA) Paper number 2137 Abstract: New audio compression algorithms and non-volatile flash memory technology have enabled the creation of portable solid-state personal audio players. This paper presents a low-power portable audio system based on the Texas Instruments TMS320C5000 DSP family. This system is designed to play music and other audio media stored on flash memory cards that can hold over an hour of CD quality music. The flash card provides higher audio quality than a cassette tape, yet is smaller and more durable than a CD. The music or audio material downloaded to the flash is obtained from licensed distributors, either through the internet or through kiosks setup in retail outlets. All the audio decoding and watermarking required by this system is handled by the TMS320C5000 DSP. System performance characteristics are also presented. IC992137.PDF (From Author) IC992137.PDF (Rasterized) TOP New Implementation Techniques Of A Real-Time MPEG-2 Audio Encoding System Authors: Hyen-O Oh, Sung-Youn Kim, Dae-Hee Youn, Il-Whan Cha, Page (NA) Paper number 1594 Abstract: In this study, new implementation techniques of a real-time MPEG-2 audio encoding system are presented. The system is developed using general-purpose DSP's. It consists of one master unit and five slave units, and its structure is basically based on our early work. Two fast algorithms are developed and applied to the most compute-intensive routines of the encoding process. These algorithms play a key role to improve the entire system performance. The implemented system is designed to encode audio signal into MPEG-2 layer II bitstream with full configurations up to 5.1 channels and 640Kbps, and intended to support state-of-the-art quality. Generated bitstream can be stored in hard disk on PC or sent to integration system to be multiplexed with corresponding video-bitstream. IC991594.PDF (From Author) IC991594.PDF (Rasterized) TOP An Improved Residual-Domain Phase/Amplitude Model for Sinusoidal Coding of Speech at Very Low Bit Rates: A Variable Rate Scheme Authors: Sassan Ahmadi, Page (NA) Paper number 2193 Abstract: An improved harmonic sinusoidal model is presented, where the underlying sine wave amplitudes and phases are efficiently represented using a combination of linear prediction, linear phase alignment, all-pass filtering, and spectral sampling in the residual-domain. The analysis and synthesis systems are introduced and the derivation and encoding of each model parameter is discussed. Performance analysis on a large database indicates effective modeling of the sinusoidal parameters. A variable-rate sinusoidal coder operating at an average bit rate of 1.75 kbps, based on the proposed model, has been developed, yielding reproduced speech of good quality, intelligibility, and naturalness. The proposed model may find applications in low bit rate speech coding in high capacity wireless communication systems. IC992193.PDF (From Author) IC992193.PDF (Rasterized) TOP Implementation of an Enhanced Fixed Point Variable Bit-Rate MELP Vocoder on TMS320C549 Authors: Ali E Ertan, Emre B Aksu, Hakki G Ilk, Haydar Karci, Önder Karpat, Taner Kolçak, Levent S;endur, Mubeccel Demirekler, Ahmet Enis Çetin, Page (NA) Paper number 1758 Abstract: In this paper, a fixed point Variable Bit-Rate (VBR) Mixed Excitation Linear Predictive Coding (MELP) vocoder is presented. The VBR-MELP vocoder is also implemented on a TMS320C54x and it achieves virtually indistinguishable federal standard MELP quality at bit-rates between 1.0 to 1.6 kb/s. The backbone of VBR-MELP vocoder is similar to that of federal standard MELP. It utilizes a novel sub-band based voice activity detector in the back-end of encoder to discriminate background noise from speech activity. Since proposed detector uses only parameters extracted in the encoder, its computational complexity is very low. IC991758.PDF (From Author) IC991758.PDF (Rasterized) TOP Improving EVRC Half Rate by the Algebraic VQ-CELP Authors: Fenghua Liu, Ryan Heidari, Page (NA) Paper number 3020 Abstract: This paper presents an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of IS-127, the enhanced variable rate codec (EVRC). In AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. An open-loop procedure is used to select the sub-vectors. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, a two-pulse version of ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. Under a joint collaboration effort with SK Telecom, a field-testing was performed in Korea to evaluate the performance of the proposed AVQ algorithm. The results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate. IC993020.PDF (From Author) IC993020.PDF (Rasterized) TOP A 4 kbps Adaptive Fixed Code-Excited Linear Prediction Speech Coder Authors: Hong Kook Kim, AT&T Labs Research, Rm. E148, 180 Park Avenue, Florham Park NJ 07932, USA (USA) Mi Suk Lee, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea (Korea) Hwang Soo Lee, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea (Korea) Page (NA) Paper number 1329 Abstract: In this paper, we propose an adaptive fixed code-excited linear prediction (AF-CELP) speech coder operating at 4 kbps. By exploiting the fact that a fixed codebook contribution to speech signal is also periodic as the corresponding adaptive codebook contribution, the adaptive fixed codebook model efficiently represents excitation signals. In order to overcome the quality degradation caused by the coarse quantization of excitation, a paired pulse algebraic codebook structure is also applied to the excitation model. Additionally, a pitch prefiltering, a noise spreading, and a harmonic enhancement technique are adopted in the decoding process. The spectrogram reading and informal listening tests proved that the AF-CELP reproduces high quality speech. IC991329.PDF (From Author) TOP Multimode Variable Bit Rate Speech Coding: An Efficient Paradigm For High-Quality Low-Rate Representation Of Speech Signal Authors: Amitava Das, Andy DeJaco, Sharath Manjunath, Ananth Ananthapadmanabhan, Jeff Huang, Eddie Choy, Page (NA) Paper number 3014 Abstract: The speech signal consists of a time-varying ensemble of different types of segments with distinct characteristics, which require different degrees of coding resolution in order to retain an overall high voice quality. A fixed-rate coder can capture such time-varying characteristics only if it operates at a high enough bit rate. At low bit rate, a fixed-rate coder will not be able to capture all of these various segments well and will fail to render high voice quality. A multimode variable bit rate (VBR) coder uses an arsenal of modes, operating at different bit rates. These modes are designed to represent these different speech segments optimally with the ight amount of coding resolution. Thus, a multimode VBR codec adapts the coding mechanism to the input speech and delivers high quality at low (average) rates. This paper presents the essential framework and the unique advantages of a multimode VBR codec and suggests algorithms for the different modes. IC993014.PDF (From Author) IC993014.PDF (Rasterized) TOP Segmental Prototype Interpolation Coding Authors: Costas C.S. Xydeas, University of Manchester, UK (U.K.) Thomas M Chapman, University of Manchester, UK (U.K.) Page (NA) Paper number 2340 Abstract: Current parametric speech coding schemes can achieve high communications quality speech at bit rates in the range of 2.4 to 1.5kbits/sec. Most schemes sample and quantise, at regular intervals, the "tracks in time" generated by the parameters of the speech production model. As a result, reconstructed "parameter tracks" do not evolve "smoothly" with time. Furthermore, no advantage is taken of the "linguistic event" nature of speech. In this paper, model parameter "time tracks" are split into non overlapping speech "event" related segments. These segment based evolutions of model parameters are then vector quantised to provide at the receiver a smooth and subjectively meaningful reconstruction. Thus the paper presents an application of this generic segmental speech model quantisation approach to a 1.5kbits/sec Prototype Interpolation Coding (PIC) system. Results indicate that the proposed methodology can almost halve the bit rate of this PIC system while preserving overall recovered speech quality. IC992340.PDF (From Author) IC992340.PDF (Rasterized) TOP