Authors:
Nam Soo Kim,
Page (NA) Paper number 1540
Abstract:
The environmental conditions in which a speech recognition system should
be operating are usually nonstationary. We present an approach to compensate
for the effects of time-varying noise using a bank of Kalman filters.
The presented method is based on the interacting multiple model (IMM)
technique well-known in the area of multiple target tracking. Moreover,
we propose a way to get fixed-interval smoothed estimates for the environmental
parameters. The performances of the proposed approaches are evaluated
in the continuous digit recognition experiments where not only the
slowly evolving noise but also the rapidly varying noise sources are
added to simulate the noisy environments.
Authors:
Wei-Tyng Hong,
Sin-Horng Chen,
Page (NA) Paper number 1607
Abstract:
A segment-based C0 (the zero-th order of cepstral coefficient) adaptation
scheme for PMC-based Mandarin speech recognition is proposed in this
paper. It incorporates a new C0 model of speech signal into the PMC
method to improve the gain matching between the clean-speech HMM models
and the current noise model. The C0 model is constructed in the training
phase by jointly modeling the normalized C0 with other MFCC recognition
features to form C0-normalized HMM models. In the testing phase, it
pre-segments the input utterance into syllable-like segments, performs
C0-denormaliztion operations to expand the C0-normalized HMM models,
and uses them in the PMC method. Compared with the conventional PMC
method, the proposed method can achieve a much better noise compensation
effect due to the use of more precise gain matching in the PMC model
combination. Experimental results showed that the base-syllable accuracy
rate was significantly upgraded for continuous noisy Mandarin speech
recognition.
Authors:
Jeih-Weih Hung, Dept of Electrical Engineering, National Taiwan University (Taiwan)
Jia-Lin Shen,
Lin-Shan Lee, Dept of Electrical Engineering, National Taiwan University (Taiwan)
Page (NA) Paper number 2151
Abstract:
The parallel model combination (PMC) technique has been very successful
and frequently used to improve the performance of a speech recognition
system under noisy environments. In this approach it is assumed that
the log spectrum of speech signals is Gaussian-distributed, which is
not always valid especially when the number of mixtures in the HMM's
is few. In this paper, a simple approach is proposed to improve the
PMC method by splitting the mixtures before the domain transformation
process in PMC is performed, and merging the mixtures back to the original
number after the PMC processes are completed. Preliminary experimental
results show that the increased number of mixtures during the PMC processes
can in fact provide significant improvements over the original PMC
method in terms of the recognition accuracies, especially when the
SNR is low.
Authors:
Gunther Ruske, Inst. for Human-Machine-Communication, Munich University of Technology, Germany (Germany)
Ki Yong Lee, School of Electronic Engineering, Soongsil University, 1-1 Sangdo-5Dong, Dongjak-Ku, Seoul, 156-743 Korea (Korea)
Page (NA) Paper number 1425
Abstract:
In this paper, a gain-adapted speech recognition in unknown noise is
developed in time domain. The noise is assumed to be the colored noise.
The nonstationary autoregressive (NAR) hidden markov model (HMM) used
to model clean speeches. The nonstationary AR is modeled by polynomial
functions with a linear combination of M known basis functions. Enhancement
using multiple Kalman filters is performed for the gain contour of
speech and estimation of noise model when only the noisy signal is
available.
Authors:
Alexander Fischer, Philips Research Laboratories, Aachen, Germany (Germany)
Volker Stahl, Philips Research Laboratories Aachen, Germany (Germany)
Page (NA) Paper number 1449
Abstract:
Data collections in the car environment require much more effort in
terms of cost and time as compared to the telephone or the office environment.
Therefore we apply supervised database adaptation from the telephone
environment to the car environment to allow quick setup of car environment
recognizers. Further reduction of word error rate is obtained by unsupervised
online adaptation during recognition. We investigate the common techniques
MLLR and MAP for that purpose. We give results on command word recognition
in the car environment for all combinations of database and online
adaptation in task-dependent and task-independent scenarios. The possibility
of setting up speech recognizers for the car environment based on telephone
data and a limited amount of adaptation material from the car environment
is demonstrated.
Authors:
Diego Giuliani,
Marco Matassoni,
Maurizio Omologo,
Piergiorgio Svaizer,
Page (NA) Paper number 1895
Abstract:
This paper addresses the problem of hands-free speech recognition in
a noisy office environment. An array of six omnidirectional microphones
and a corresponding time delay compensation module are used to provide
a beamformed signal as input to a HMM-based recognizer. Training of
HMMs is performed either using a clean speech database or using a filtered
version of the same database. The filtering consists in a convolution
with the acoustic impulse response between speaker and microphone,
to reproduce the reverberation effect. Background noise is summed to
provide the desired SNR. The paper shows that the new models trained
on these data perform better than the baseline ones. Furthermore, the
paper investigates on MLLR adaptation of the new models. It is shown
that a further performance improvement is obtained, allowing to reach
a 98.7% WRR in a connected digit recognition task, when the talker
is at 1.5 m distance from the array.
Authors:
Chafic E Mokbel, France Telecom - CNET - DIH/DIPS (Currently at IDIAP) (France)
Olivier Collin, France Telecom - CNET - DIH/DIPS (France)
Page (NA) Paper number 1468
Abstract:
Classical adaptation approaches generally allow a reliably trained
model to match a particular condition. In this paper, we define an
incremental version of the segmental-EM algorithm. This method permits
to incrementally enrich a model first trained with limited amount of
data. Resource memory constraints allow only the initial data statistics
to be stored. The proposed method uses these statistics by fixing,
within the segmental EM algorithm applied on both initial and new data,
the initial optimal paths in the model for the initial data. We proved
theoretically that this is equivalent to the segmental MAP adaptation
with specific choice of priors. Experimented on two speaker dependent
telephone databases, the approach permitted to incrementally integrate
new conditions of use. The performance was slightly less than that
obtained with classical training over the whole data. As expected with
the MAP interpretation of the algorithm, initial data characteristics
influence largely the model evolution.
Authors:
Bishnu S Atal, AT&T Labs, Florham Park, NJ 07932, USA (USA)
Page (NA) Paper number 1910
Abstract:
Speech recognition is usually regarded as a problem in the field of
pattern recognition, where one first estimates the probability density
function of each pattern to be recognized and then uses Bayes theorem
to identify the pattern which provides the highest likelihood for the
observed speech data. In this paper, we will take a different approach
to this problem. In speech recognition, the goal is communication of
information by voice and we will discuss the basics of speech recognition
from a communication perspective. The speech signal at the acoustic
level has a bit rate of 64 kb/s but the underlying sound patterns have
an information rate of less than 100 b/s. What is the role of this
high bit rate at the acoustic level? We will discuss the principles
of decoding patterns that are submerged in an ocean of seemingly irrelevant
information.
|