Speech Analysis

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Speech Analysis/Synthesis/Conversion by Using Sequential Processing

Authors:

Boonpramuk Panuthat,
Tetsuo Funada,
Noboru Kanedera,

Page (NA) Paper number 1618

Abstract:

This paper presents a method for speech analysis/synthesis/conversion by using sequential processing. The aims of this method are to improve the quality of synthesized speech and to convert the original speech into another speech of different characteristics. We apply the Kalman Filter for estimating the auto-regressive coefficients of 'time varying AR model with unknown input (ARUI model)', which we have proposed to improve the conventional AR model, and we use a band-pass filter for making 'a guide signal' to extract the pitch period from the residual signal. These signals are utilized to make the driving source signal in speech synthesis. We also use the guide signal for speech conversion, such as in pitch and utterance length. Moreover, we show experimentally that this method can analyze/synthesize/convert speech without causing instability by using the smoothed auto-regressive coefficients.

IC991618.PDF (From Author) IC991618.PDF (Rasterized)

TOP


Modelling Energy Flow in the Vocal Tract with Applications to Glottal Closure and Opening Detection

Authors:

D. Mike Brookes,
Han Pin Loke,

Page (NA) Paper number 1864

Abstract:

The pitch-synchronous analysis that is used in several areas of speech processing often requires robust detection of the instants of glottal closure and opening. In this paper we derive expressions for the flow of acoustic energy in the lossless-tube model of the vocal tract and show how linear predictive analysis may be used to estimate the waveform of acoustic input power at the glottis. We demonstrate that this signal may be used to identify the instants of glottal closure and opening during voiced speech and contrast it with the LPC residual signal that previous authors have used for this purpose.

IC991864.PDF (From Author) IC991864.PDF (Rasterized)

TOP


Fitting the Mel Scale

Authors:

Srinivasan Umesh, Indian Institute of Technology (India)
Leon Cohen,
Douglas J Nelson, Dept. of Defense USA (USA)

Page (NA) Paper number 2167

Abstract:

We show that there are many qualitatively different equations, each with few parameters, that fit the experimentally obtained Mel scale. We investigate the often made remark that there are two regions to the Mel scale, the first region ( < ~ 1000 Hz. ) being linear and the upper region being logarithmic. We show that there is no evidence, based on the experimental data points, that there are two qualitatively different regions or that the lower region is linear and upper region logarithmic. In fact F_M= f/(af +b) where F_M and f are the mel and physical frequency respectively, fits better then a line in the linear region or a logarithm in the ``log'' region.

IC992167.PDF (From Author) IC992167.PDF (Rasterized)

TOP


Fast Accent Identification and Accented Speech Recognition

Authors:

Wai Kat Liu,
Pascale Fung,

Page (NA) Paper number 2349

Abstract:

The performance of speech recognition systems degrades when speaker accent is different from that in the training set. Accent-independent or accent-dependent recognition both require collection of more training data. In this paper, we propose a faster accent classification approach using phoneme-class models. We also present our findings in acoustic features sensitive to a Cantonese accent, and possibly other Asian language accents. In addition, we show how we can rapidly transform a native accent pronunciation dictionary to that for accented speech by simply using knowledge of the native language of the foreign speaker. The use of this accent-adapted dictionary reduces recognition error rate by 13.5%, similar to the results obtained from a longer, data-driven process.

IC992349.PDF (From Author) IC992349.PDF (Rasterized)

TOP


Relevancy of Time-Frequency Features for Phonetic Classification Measured by Mutual Information

Authors:

Howard H Yang,
Sarel J Van Vuuren,
Hynek Hermansky,

Page (NA) Paper number 2454

Abstract:

In this paper we use mutual information to study the distribution in time and frequency of information relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information between phoneme labels and a point of logarithmic energy in the time-frequency plane and (b) compute the joint mutual information between phoneme labels and two points of logarithmic energy in the time-frequency plane.

IC992454.PDF (From Author) IC992454.PDF (Rasterized)

TOP


Hidden Markov Models Based on Multi-Space Probability Distribution for Pitch Pattern Modeling

Authors:

Keiichi Tokuda, Nagoya Institute of Technology, Nagoya, Japan (Japan)
Takashi Masuko, Tokyo Institute of Technology, Japan (Japan)
Noboru Miyazaki, NTT Basic Research Laboratories, Japan (Japan)
Takao Kobayashi, Tokyo Institute of Technology, Japan (Japan)

Page (NA) Paper number 2479

Abstract:

This paper discusses a hidden Markov model (HMM) based on multi-space probability distribution (MSD). The HMMs are widely-used statistical models to characterize the sequence of speech spectra and have successfully been applied to speech recognition systems. From these facts, it is considered that the HMM is useful for modeling pitch patterns of speech. However, we cannot apply the conventional discrete or continuous HMMs to pitch pattern modeling since the observation sequence of pitch pattern is composed of one-dimensional continuous values and a discrete symbol which represents ``unvoiced''. MSD-HMM includes discrete HMM and continuous mixture HMM as special cases, and further can model the sequence of observation vectors with variable dimension including zero-dimensional observations, i.e., discrete symbols. As a result, MSD-HMMs can model pitch patterns without heuristic assumption. We derive a reestimation algorithm for the extended HMM and show that it can find a critical point of the likelihood function.

IC992479.PDF (From Author) IC992479.PDF (Rasterized)

TOP


An Algorithm for Glottal Volume Velocity Estimation

Authors:

Ashraf Alkhairy,

Page (NA) Paper number 2492

Abstract:

We present a new method for the estimation of the glottal volume velocity from voiced segments of the radiated acoustic speech waveform. Our algorithm is based on spectral factorization of the signal and is a general purpose procedure. It does not suffer from residual effects or assume constraining models for the vocal tract and the glottal source, as is commonly the case with existing methods. The resulting estimate of the glottal volume velocity is accurate and can be used for modeling and synthesis purposes.

IC992492.PDF (From Author) IC992492.PDF (Rasterized)

TOP


Frame-Level Noise Classification in Mobile Environments

Authors:

Khaled El-Maleh,
Ara Samouelian,
Peter Kabal,

Page (NA) Paper number 1774

Abstract:

Background environmental noises degrade the performance of speech-processing systems (e.g. speech coding, speech recognition). By modifying the processing according to the type of background noise, the performance can be enhanced. This requires noise classification. In this paper, four pattern-recognition frameworks have been used to design noise classification algorithms. Classification is done on a frame-by-frame basis (e.g. once every 20 ms). Five commonly encountered noises in mobile telephony (i.e. car, street, babble, factory, and bus) have been considered in our study. Our experimental results show that the Line Spectral Frequencies (LSF's) are robust features in distinguishing the different classes of noises.

IC991774.PDF (From Author) IC991774.PDF (Rasterized)

TOP