Authors:
Herman J.M. Steeneken,
John H.L. Hansen,
Page (NA) Paper number 3033
Abstract:
ABSTRACT The NATO research study group on ``Speech and Language Technology''
recently completed a three year project on the effect of ``stress''
on speech production and system performance. For this purpose various
speech databases were collected. A definition of various states of
stress and the corresponding type of stressor is proposed. Results
are reported from analysis and assessment studies performed with the
databases collected for this project.
Authors:
Jean-Claude Junqua,
Steven C Fincke,
Kenneth L Field,
Page (NA) Paper number 3034
Abstract:
To study the Lombard reflex, more realistic databases representing
real-world conditions need to be recorded and analyzed. In this paper
we 1) summarize a procedure to record Lombard data which provides a
good approximation of realistic conditions, 2) present an analysis
per class of sounds for duration and energy of words recorded while
subjects are listening to noise through open-ear headphones a) when
speakers are in communication with a recognition device and b) when
reading a list, and 3) report on the influence of speaking style on
speaker-dependent and speaker-independent experiments. This paper extends
a previous study aimed at analyzing the influence of the communication
factor on the Lombard reflex. We also show evidence that it is difficult
to separate the speaker from the environment stressor (in this case
the noise) when studying the Lombard reflex. The main conclusion of
our pilot study is that the communication factor should not be neglected
because it strongly influences the Lombard reflex.
Authors:
Guojun Zhou, Duke University (U.K.)
John H.L. Hansen, Duke University (U.K.)
James F. Kaiser, Duke University (U.K.)
Page (NA) Paper number 3035
Abstract:
Speech production variations due to perceptually induced stress contribute
significantly to reduced speech processing performance. One approach
that can improve the robustness of speech processing (e.g., recognition)
algorithms against stress is to formulate an objective classification
of speaker stress based upon the acoustic speech signal. In this paper,
an overview of recent methods for stress classification is presented.
First, we review traditional pitch-based methods for stress detection
and classification. Second, neural network based stress classifiers
with cepstral-based features, as well as wavelet-based classification
algorithms are considered. The effect of stress on linear speech features
is discussed, followed by the application of linear features and Teager
Energy Operator (TEO) based nonlinear features for effective stress
classification. A new evaluation for stress classification and assessment
is presented using a critical band frequency partition based TEO feature
and the combination of several linear features. Results using NATO
databases of actual speech under stress are presented. Finally, we
discuss issues relating to stress classification across known and unknown
speakers and suggest areas for further research.
Authors:
Raymond E. Slyh,
W. Todd Nelson,
Eric G. Hansen,
Page (NA) Paper number 3036
Abstract:
This paper highlights the results of an investigation of several features
across the style classes of the ``simulated'' portion of the SUSAS
database. The features considered here include a recently-introduced
measure of speaking rate called mrate, measures of shimmer, measures
of jitter, and features derived from fundamental frequency (F0) contours.
The F0 contour features are the means of F0 and Delta F0 over the first,
middle, and last thirds of the ordered set of voiced frames for each
word. Mrate exhibits differences between the Fast, Neutral, and Slow
styles and between the Loud, Neutral, and Soft styles. Shimmer and
jitter exhibit differences that are similar to those of mrate; however,
the shimmer and jitter differences are less consistent than the mrate
differences across the speakers in the database. Several F0 contour
features exhibit differences between the Angry, Loud, Lombard, and
Question styles and most of the other styles.
Authors:
Allan J South,
Page (NA) Paper number 3000
Abstract:
The performance of speech recognisers in combat aircraft is degraded
seriously by the extreme physical stresses to which the crew are subjected.
This paper describes measurements of first and second formant frequencies
of nine vowels from one speaker recorded under high levels of acceleration,
with and without positive pressure breathing. Under acceleration alone,
F2 is reduced for high front vowels, while F1 remains constant, but
for back and mid vowels, F1 reduces with little change in F2. When
positive pressure breathing is introduced, nearly all vowels are affected,
and the "vowel triangle" on the F1-F2 plane collapses inwards, towards
the neutral vowel position. If these changes are found to be consistent
between speakers, it is hoped to develop techniques of voice transformation
to reverse them, and thus improve the performance of speech recognisers
in this harsh environment.
|