SpacerHome

Spacer
Mirror Sites
Spacer
General Information
Spacer
Confernce Schedule
Spacer
Technical Program
Spacer
     Plenary Sessions
Spacer
     Special Sessions
Spacer
     Expert Summaries
Spacer
     Tutorials
Spacer
     Industry Technology Tracks
Spacer
     Technical Sessions
    
By Date
    March 16
    March 17
    March 18
    March 19
    
By Category
    AE     COMM
    DISPS     DSPE
    ESS     IMDSP
    ITT     MMSP
    NNSP     SAM
    SP     SPEC
    SPTM
    
By Author
        A    B    C    D   
        E    F    G    H   
        I    J    K    L   
        M    N    O    P   
        Q    R    S    T   
        U    V    W    X   
        Y    Z   
Spacer
Tutorials
Spacer
Industry Technology Tracks
Spacer
Exhibits
Spacer
Sponsors
Spacer
Registration
Spacer
Coming to Phoenix
Spacer
Call for Papers
Spacer
Author's Kit
Spacer
On-line Review
Spacer
Future Conferences
Spacer
Help

Abstract: Session SP-7

Conference Logo

SP-7.1  

PDF File of Paper Manuscript
Speech Analysis/Synthesis/Conversion by Using Sequential Processing
Panuthat Boonpramuk, Tetsuo Funada (Faculty of Engineering,Kanazawa University), Noboru Kanedera (Ishikawa National College of Technology)

This paper presents a method for speech analysis/synthesis/conversion by using sequential processing. The aims of this method are to improve the quality of synthesized speech and to convert the original speech into another speech of different characteristics. We apply the Kalman Filter for estimating the auto-regressive coefficients of 'time varying AR model with unknown input (ARUI model)', which we have proposed to improve the conventional AR model, and we use a band-pass filter for making 'a guide signal' to extract the pitch period from the residual signal. These signals are utilized to make the driving source signal in speech synthesis. We also use the guide signal for speech conversion, such as in pitch and utterance length. Moreover, we show experimentally that this method can analyze/synthesize/convert speech without causing instability by using the smoothed auto-regressive coefficients.


SP-7.2  

PDF File of Paper Manuscript
Modelling Energy Flow in the Vocal Tract with Applications to Glottal Closure and Opening Detection
Mike Brookes, Han Pin Loke (Imperial College)

The pitch-synchronous analysis that is used in several areas of speech processing often requires robust detection of the instants of glottal closure and opening. In this paper we derive expressions for the flow of acoustic energy in the lossless-tube model of the vocal tract and show how linear predictive analysis may be used to estimate the waveform of acoustic input power at the glottis. We demonstrate that this signal may be used to identify the instants of glottal closure and opening during voiced speech and contrast it with the LPC residual signal that previous authors have used for this purpose.


SP-7.3  

PDF File of Paper Manuscript
Fitting the Mel Scale
Srinivasan Umesh (Indian Institute of Technology), Leon Cohen (City University of New York), Douglas Nelson (Dept. of Defense USA)

We show that there are many qualitatively different equations, each with few parameters, that fit the experimentally obtained Mel scale. We investigate the often made remark that there are two regions to the Mel scale, the first region ( $< \sim$ 1000 Hz. ) being linear and the upper region being logarithmic. We show that there is no evidence, based on the experimental data points, that there are two qualitatively different regions or that the lower region is linear and upper region logarithmic. In fact $F_M= f/(af +b)$ where $F_M$ and $f$ are the mel and physical frequency respectively, fits better then a line in the linear region or a logarithm in the ``log'' region.


SP-7.4  

PDF File of Paper Manuscript
Fast Accent Identification and Accented Speech Recognition
Pascale Fung, Wai Kat LIU (University of Science and Technology (HKUST))

The performance of speech recognition systems degrades when speaker accent is different from that in the training set. Accent-independent or accent-dependent recognition both require collection of more training data. In this paper, we propose a faster accent classification approach using phoneme-class models. We also present our findings in acoustic features sensitive to a Cantonese accent, and possibly other Asian language accents. In addition, we show how we can rapidly transform a native accent pronunciation dictionary to that for accented speech by simply using knowledge of the native language of the foreign speaker. The use of this accent-adapted dictionary reduces recognition error rate by 13.5\%, similar to the results obtained from a longer, data-driven process.


SP-7.5  

PDF File of Paper Manuscript
Relevancy of Time-Frequency features for Phonetic Classification measured by Mutual Information
Howard H Yang, Sarel J Van Vuuren, Hynek Hermansky (Oregon Graduate Institute of Science and Technology)

In this paper we use mutual information to study the distribution in time and frequency of information relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information between phoneme labels and a point of logarithmic energy in the time-frequency plane and (b) compute the joint mutual information between phoneme labels and two points of logarithmic energy in the time-frequency plane.


SP-7.6  

PDF File of Paper Manuscript
Hidden Markov Models Based on Multi-Space Probability Distribution for Pitch Pattern Modeling
Keiichi Tokuda (Nagoya Institute of Technology, Nagoya, Japan), Takashi Masuko (Tokyo Institute of Technology, Japan), Noboru Miyazaki (NTT Basic Research Laboratories, Japan), Takao Kobayashi (Tokyo Institute of Technology, Japan)

This paper discusses a hidden Markov model (HMM) based on multi-space probability distribution (MSD). The HMMs are widely-used statistical models to characterize the sequence of speech spectra and have successfully been applied to speech recognition systems. From these facts, it is considered that the HMM is useful for modeling pitch patterns of speech. However, we cannot apply the conventional discrete or continuous HMMs to pitch pattern modeling since the observation sequence of pitch pattern is composed of one-dimensional continuous values and a discrete symbol which represents ``unvoiced''. MSD-HMM includes discrete HMM and continuous mixture HMM as special cases, and further can model the sequence of observation vectors with variable dimension including zero-dimensional observations, i.e., discrete symbols. As a result, MSD-HMMs can model pitch patterns without heuristic assumption. We derive a reestimation algorithm for the extended HMM and show that it can find a critical point of the likelihood function.


SP-7.7  

PDF File of Paper Manuscript
An Algorithm for Glottal Volume Velocity Estimation
Ashraf Alkhairy (M. I. T.)

We present a new method for the estimation of the glottal volume velocity from voiced segments of the radiated acoustic speech waveform. Our algorithm is based on spectral factorization of the signal and is a general purpose procedure. It does not suffer from residual effects or assume constraining models for the vocal tract and the glottal source, as is commonly the case with existing methods. The resulting estimate of the glottal volume velocity is accurate and can be used for modeling and synthesis purposes.


SP-7.8  

PDF File of Paper Manuscript
Frame-Level Noise Classification in Mobile Environments
Khaled El-Maleh (Electrical and Computer Engineering Dept., McGill University), Ara Samouelian (School of Elect., Comp. and Telecomm. Eng., University of Wollongong), Peter Kabal (Electrical and Computer Engineering Dept., McGill University)

Background environmental noises degrade the performance of speech-processing systems (e.g. speech coding, speech recognition). By modifying the processing according to the type of background noise, the performance can be enhanced. This requires noise classification. In this paper, four pattern-recognition frameworks have been used to design noise classification algorithms. Classification is done on a frame-by-frame basis (e.g. once every 20 ms). Five commonly encountered noises in mobile telephony (i.e. car, street, babble, factory, and bus) have been considered in our study. Our experimental results show that the Line Spectral Frequencies (LSF's) are robust features in distinguishing the different classes of noises.


SP-6 SP-8 >


Last Update:  February 4, 1999         Ingo Höntsch
Return to Top of Page