ICASSP99 Speech Analysis

Speech Analysis
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Speech Analysis/Synthesis/Conversion by Using Sequential Processing Authors: Boonpramuk Panuthat, Tetsuo Funada, Noboru Kanedera, Page (NA) Paper number 1618 Abstract: This paper presents a method for speech analysis/synthesis/conversion by using sequential processing. The aims of this method are to improve the quality of synthesized speech and to convert the original speech into another speech of different characteristics. We apply the Kalman Filter for estimating the auto-regressive coefficients of 'time varying AR model with unknown input (ARUI model)', which we have proposed to improve the conventional AR model, and we use a band-pass filter for making 'a guide signal' to extract the pitch period from the residual signal. These signals are utilized to make the driving source signal in speech synthesis. We also use the guide signal for speech conversion, such as in pitch and utterance length. Moreover, we show experimentally that this method can analyze/synthesize/convert speech without causing instability by using the smoothed auto-regressive coefficients. IC991618.PDF (From Author) IC991618.PDF (Rasterized) TOP Modelling Energy Flow in the Vocal Tract with Applications to Glottal Closure and Opening Detection Authors: D. Mike Brookes, Han Pin Loke, Page (NA) Paper number 1864 Abstract: The pitch-synchronous analysis that is used in several areas of speech processing often requires robust detection of the instants of glottal closure and opening. In this paper we derive expressions for the flow of acoustic energy in the lossless-tube model of the vocal tract and show how linear predictive analysis may be used to estimate the waveform of acoustic input power at the glottis. We demonstrate that this signal may be used to identify the instants of glottal closure and opening during voiced speech and contrast it with the LPC residual signal that previous authors have used for this purpose. IC991864.PDF (From Author) IC991864.PDF (Rasterized) TOP Fitting the Mel Scale Authors: Srinivasan Umesh, Indian Institute of Technology (India) Leon Cohen, Douglas J Nelson, Dept. of Defense USA (USA) Page (NA) Paper number 2167 Abstract: We show that there are many qualitatively different equations, each with few parameters, that fit the experimentally obtained Mel scale. We investigate the often made remark that there are two regions to the Mel scale, the first region ( < ~ 1000 Hz. ) being linear and the upper region being logarithmic. We show that there is no evidence, based on the experimental data points, that there are two qualitatively different regions or that the lower region is linear and upper region logarithmic. In fact F_M= f/(af +b) where F_M and f are the mel and physical frequency respectively, fits better then a line in the linear region or a logarithm in the ``log'' region. IC992167.PDF (From Author) IC992167.PDF (Rasterized) TOP Fast Accent Identification and Accented Speech Recognition Authors: Wai Kat Liu, Pascale Fung, Page (NA) Paper number 2349 Abstract: The performance of speech recognition systems degrades when speaker accent is different from that in the training set. Accent-independent or accent-dependent recognition both require collection of more training data. In this paper, we propose a faster accent classification approach using phoneme-class models. We also present our findings in acoustic features sensitive to a Cantonese accent, and possibly other Asian language accents. In addition, we show how we can rapidly transform a native accent pronunciation dictionary to that for accented speech by simply using knowledge of the native language of the foreign speaker. The use of this accent-adapted dictionary reduces recognition error rate by 13.5%, similar to the results obtained from a longer, data-driven process. IC992349.PDF (From Author) IC992349.PDF (Rasterized) TOP Relevancy of Time-Frequency Features for Phonetic Classification Measured by Mutual Information Authors: Howard H Yang, Sarel J Van Vuuren, Hynek Hermansky, Page (NA) Paper number 2454 Abstract: In this paper we use mutual information to study the distribution in time and frequency of information relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information between phoneme labels and a point of logarithmic energy in the time-frequency plane and (b) compute the joint mutual information between phoneme labels and two points of logarithmic energy in the time-frequency plane. IC992454.PDF (From Author) IC992454.PDF (Rasterized) TOP Hidden Markov Models Based on Multi-Space Probability Distribution for Pitch Pattern Modeling Authors: Keiichi Tokuda, Nagoya Institute of Technology, Nagoya, Japan (Japan) Takashi Masuko, Tokyo Institute of Technology, Japan (Japan) Noboru Miyazaki, NTT Basic Research Laboratories, Japan (Japan) Takao Kobayashi, Tokyo Institute of Technology, Japan (Japan) Page (NA) Paper number 2479 Abstract: This paper discusses a hidden Markov model (HMM) based on multi-space probability distribution (MSD). The HMMs are widely-used statistical models to characterize the sequence of speech spectra and have successfully been applied to speech recognition systems. From these facts, it is considered that the HMM is useful for modeling pitch patterns of speech. However, we cannot apply the conventional discrete or continuous HMMs to pitch pattern modeling since the observation sequence of pitch pattern is composed of one-dimensional continuous values and a discrete symbol which represents ``unvoiced''. MSD-HMM includes discrete HMM and continuous mixture HMM as special cases, and further can model the sequence of observation vectors with variable dimension including zero-dimensional observations, i.e., discrete symbols. As a result, MSD-HMMs can model pitch patterns without heuristic assumption. We derive a reestimation algorithm for the extended HMM and show that it can find a critical point of the likelihood function. IC992479.PDF (From Author) IC992479.PDF (Rasterized) TOP An Algorithm for Glottal Volume Velocity Estimation Authors: Ashraf Alkhairy, Page (NA) Paper number 2492 Abstract: We present a new method for the estimation of the glottal volume velocity from voiced segments of the radiated acoustic speech waveform. Our algorithm is based on spectral factorization of the signal and is a general purpose procedure. It does not suffer from residual effects or assume constraining models for the vocal tract and the glottal source, as is commonly the case with existing methods. The resulting estimate of the glottal volume velocity is accurate and can be used for modeling and synthesis purposes. IC992492.PDF (From Author) IC992492.PDF (Rasterized) TOP Frame-Level Noise Classification in Mobile Environments Authors: Khaled El-Maleh, Ara Samouelian, Peter Kabal, Page (NA) Paper number 1774 Abstract: Background environmental noises degrade the performance of speech-processing systems (e.g. speech coding, speech recognition). By modifying the processing according to the type of background noise, the performance can be enhanced. This requires noise classification. In this paper, four pattern-recognition frameworks have been used to design noise classification algorithms. Classification is done on a frame-by-frame basis (e.g. once every 20 ms). Five commonly encountered noises in mobile telephony (i.e. car, street, babble, factory, and bus) have been considered in our study. Our experimental results show that the Line Spectral Frequencies (LSF's) are robust features in distinguishing the different classes of noises. IC991774.PDF (From Author) IC991774.PDF (Rasterized) TOP