SpacerHome

Spacer
Mirror Sites
Spacer
General Information
Spacer
Confernce Schedule
Spacer
Technical Program
Spacer
     Plenary Sessions
Spacer
     Special Sessions
Spacer
     Expert Summaries
Spacer
     Tutorials
Spacer
     Industry Technology Tracks
Spacer
     Technical Sessions
Spacer
Tutorials
Spacer
Industry Technology Tracks
Spacer
Exhibits
Spacer
Sponsors
Spacer
Registration
Spacer
Coming to Phoenix
Spacer
Call for Papers
Spacer
Author's Kit
Spacer
On-line Review
Spacer
Future Conferences
Spacer
Help

Abstract: Session SPEC-4

Conference Logo

SPEC-4.1  

PDF File of Paper Manuscript
SPEECH UNDER STRESS CONDITIONS: OVERVIEW OF THE EFFECT ON SPEECH PRODUCTION AND ON SYSTEM PERFORMANCE
Herman J.M. Steeneken (TNO Human Factors Research Institute), John H.L. Hansen (Robust Speech Processing Laboratory, CSLU-Boulder)

ABSTRACT The NATO research study group on ``Speech and Language Technology'' recently completed a three year project on the effect of ``stress'' on speech production and system performance. For this purpose various speech databases were collected. A definition of various states of stress and the corresponding type of stressor is proposed. Results are reported from analysis and assessment studies performed with the databases collected for this project.


SPEC-4.2  

PDF File of Paper Manuscript
THE LOMBARD EFFECT: A REFLEX TO BETTER COMMUNICATE WITH OTHERS IN NOISE
Jean-Claude Junqua, Steven Fincke, Ken Field (Panasonic Technologies, Inc.)

To study the Lombard reflex, more realistic databases representing real-world conditions need to be recorded and analyzed. In this paper we 1) summarize a procedure to record Lombard data which provides a good approximation of realistic conditions, 2) present an analysis per class of sounds for duration and energy of words recorded while subjects are listening to noise through open-ear headphones a) when speakers are in communication with a recognition device and b) when reading a list, and 3) report on the influence of speaking style on speaker-dependent and speaker-independent experiments. This paper extends a previous study aimed at analyzing the influence of the communication factor on the Lombard reflex. We also show evidence that it is difficult to separate the speaker from the environment stressor (in this case the noise) when studying the Lombard reflex. The main conclusion of our pilot study is that the communication factor should not be neglected because it strongly influences the Lombard reflex.


SPEC-4.3  

PDF File of Paper Manuscript
METHODS FOR STRESS CLASSIFICATION: NONLINEAR TEO AND LINEAR SPEECH BASED FEATURES
Guojun Zhou, John H.L. Hansen, James F. Kaiser (Duke University)

Speech production variations due to perceptually induced stress contribute significantly to reduced speech processing performance. One approach that can improve the robustness of speech processing (e.g., recognition) algorithms against stress is to formulate an objective classification of speaker stress based upon the acoustic speech signal. In this paper, an overview of recent methods for stress classification is presented. First, we review traditional pitch-based methods for stress detection and classification. Second, neural network based stress classifiers with cepstral-based features, as well as wavelet-based classification algorithms are considered. The effect of stress on linear speech features is discussed, followed by the application of linear features and Teager Energy Operator (TEO) based nonlinear features for effective stress classification. A new evaluation for stress classification and assessment is presented using a critical band frequency partition based TEO feature and the combination of several linear features. Results using NATO databases of actual speech under stress are presented. Finally, we discuss issues relating to stress classification across known and unknown speakers and suggest areas for further research.


SPEC-4.4  

PDF File of Paper Manuscript
ANALYSIS OF MRATE, SHIMMER, JITTER AND Fo CONTOUR FEATURES ACROSS STRESS AND SPEAKING STYLE IN THE SUSAS DATABASE
Raymond E. Slyh, W. Todd Nelson (Air Force Research Laboratory), Eric G. Hansen (Veridian)

This paper highlights the results of an investigation of several features across the style classes of the ``simulated'' portion of the SUSAS database. The features considered here include a recently-introduced measure of speaking rate called mrate, measures of shimmer, measures of jitter, and features derived from fundamental frequency (F0) contours. The F0 contour features are the means of F0 and Delta F0 over the first, middle, and last thirds of the ordered set of voiced frames for each word. Mrate exhibits differences between the Fast, Neutral, and Slow styles and between the Loud, Neutral, and Soft styles. Shimmer and jitter exhibit differences that are similar to those of mrate; however, the shimmer and jitter differences are less consistent than the mrate differences across the speakers in the database. Several F0 contour features exhibit differences between the Angry, Loud, Lombard, and Question styles and most of the other styles.


SPEC-4.5  

PDF File of Paper Manuscript
Some characteristics of speech produced under high G-force and pressure breathing
Allan J South (Defence Evaluation and Research Agency)

The performance of speech recognisers in combat aircraft is degraded seriously by the extreme physical stresses to which the crew are subjected. This paper describes measurements of first and second formant frequencies of nine vowels from one speaker recorded under high levels of acceleration, with and without positive pressure breathing. Under acceleration alone, F2 is reduced for high front vowels, while F1 remains constant, but for back and mid vowels, F1 reduces with little change in F2. When positive pressure breathing is introduced, nearly all vowels are affected, and the "vowel triangle" on the F1-F2 plane collapses inwards, towards the neutral vowel position. If these changes are found to be consistent between speakers, it is hoped to develop techniques of voice transformation to reverse them, and thus improve the performance of speech recognisers in this harsh environment.


SPEC-3 SPEC-5 >


Last Update:  February 4, 1999         Ingo Höntsch
Return to Top of Page