Authors:
Aurelio Uncini, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy)
Francesco Gobbi, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy)
Francesco Piazza, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy)
Page (NA) Paper number 1915
Abstract:
In this paper a new system for speech quality enhancement (SQE) is
presented. A SQE system attempts to recover the high and low frequencies
from a narrow-band speech signal, usually working as a post-processor
at the receiver side of a transmission system. The new system operates
directly in the frequency domain using complex-valued neural networks.
In order to reduce the computational burden and improve the generalization
capabilities, a new architecture based on a recently introduced neural
network, called adaptive spline neural network (ASNN), is employed.
Experimental results demonstrate the effectiveness of the proposed
method.
Authors:
Klaus Reinhard,
Mahesan Niranjan,
Page (NA) Paper number 1932
Abstract:
In this paper we report on the extension of capturing speech transitions
embedded in diphones using trajectory models. The slowly varying dynamics
of spectral trajectories carry much discriminant information that is
very crudely modelled by traditional approaches such as HMMs. We improved
our methodology of explicitly capturing the trajectory of short time
spectral parameter vectors introducing multi-trajectory concepts in
a probabilistic framework. Optimal subspace selection is presented
which finds the most discriminant plane for classification. Using the
E-set from the TIMIT database results suggest that discriminant information
is preserved in the subspace.
Authors:
Daniel Rodriguez-Porcheron,
Marcos Faúndez-Zanuy,
Page (NA) Paper number 1290
Abstract:
This paper improves the speaker recognition rates of a MLP classifier
and LPCC codebook alone, using a linear combination between both methods.
In our simulations we have obtained an improvement of 4.7% over a LPCC
codebook of 32 vectors and 1.5% for a codebook of 128 vectors (error
rate drops from 3.68% to 2.1%). Also we propose an efficient algorithm
that reduces the computational complexity of the LPCC-VQ system by
a factor of 4.
Authors:
Holger Schwenk,
Page (NA) Paper number 2368
Abstract:
"Boosting" is a general method for improving the performance of almost
any learning algorithm. A recently proposed and very promising boosting
algorithm is AdaBoost. In this paper we investigate if AdaBoost can
be used to improve a hybrid HMM/neural network continuous speech recognizer.
Boosting significantly improves the word error rate from 6.3% to 5.3%
on a test set of the OGI Numbers95 corpus, a medium size continuous
numbers recognition task. These results compare favorably with other
combining techniques using several different feature representations
or additional information from longer time spans.
Authors:
Dan Ellis,
Nelson Morgan,
Page (NA) Paper number 2400
Abstract:
We have trained and tested a number of large neural networks for the
purpose of emission probability estimation in large vocabulary continuous
speech recognition. In particular, the problem under test is the DARPA
Broadcast News task. Our goal here was to determine the relationship
between training time, word error rate, size of the training set, and
size of the neural network. In all cases, the network architecture
was quite simple, comprising a single large hidden layer with an input
window consisting of feature vectors from 9 frames around the current
time, with a single output for each of 54 phonetic categories. Thus
far, simultaneous increases to the size of the training set and the
neural network improve performance; in other words, more data helps,
as does the training of more parameters. We continue to be surprised
that such a simple system works as well as it does for complex tasks.
Given a limitation in training time, however, there appears to be an
optimal ratio of training patterns to parameters of around 25:1 in
these circumstances. Additionally, doubling the training data and system
size appears to provide diminishing returns of error rate reduction
for the largest systems.
Authors:
Thiagarajan Balachander,
Ravi Kothari,
Page (NA) Paper number 1212
Abstract:
Subspace methods of pattern recognition form an interesting and popular
classification paradigm. The earliest subspace method of classification
was the CLass Featuring Information Compression (CLAFIC) which associated
with each class a linear subspace. Local subspace classification methodologies
which have enhanced classification power by associating multiple linear
subspaces with each class have also been investigated. In this paper,
we introduce the Oriented Soft Regional Subspace Classifier (OS-RSC).
The highlights of this classifier are (i) Class specific subspaces
are formed to specifically maximize the average projection of one class
while minimizing that of the rival class (ii) Multiple manifolds are
formed for each class increasing classification power (iii) soft sharing
of the training patterns again allows for consistent classification
performance. It turns out that the cost function for forming class
specific subspaces is maximized for a subspace of unit dimensionality.
The performance of the proposed classifier is tested on real-world
classification problems.
Authors:
Mathini Sellathurai,
Simon Haykin,
Page (NA) Paper number 1935
Abstract:
In this paper, a new theory is developed for the feature spaces of
hyperbolic tangent used as an activation kernel for non-linear support
vector machines. The theory developed herein is based on the distinct
features of hyperbolic geometry, which leads to an interesting geometrical
interpretation of the higher-dimensional feature spaces of neural networks
using hyperbolic tangent as the activation function. The new theory
is used to explain the seperability of hyperbolic tangent kernels where
we show that the seperability is possible only for a certain class
of hyperbolic kernels. Simulation results are given supporting the
seperability theory developed in this paper.
Authors:
Eisaku Maeda,
Hiroshi Murase,
Page (NA) Paper number 1595
Abstract:
The Kernel based Nonlinear Subspace (KNS) method is proposed for multi-class
pattern classification. This method consists of the nonlinear transformation
of feature spaces defined by kernel functions and subspace method in
transformed high-dimensional spaces. The Support Vector Machine, a
nonlinear classifier based on a kernel function technique, shows excellent
classification performance, however, its computational cost increases
exponentially with the number of patterns and classes. The linear subspace
method is a technique for multi-category classification, but it fails
when the pattern distribution has nonlinear characteristics or the
feature space dimension is low compared to the number of classes. The
proposed method combines the advantages of both techniques and realizes
multi-class nonlinear classifiers with better performance in less computational
time. In this paper, we show that a nonlinear subspace method can be
formulated by nonlinear transformations defined through kernel functions
and that its performance is better than that obtained by conventional
methods.
Authors:
David J Miller,
Lian Yan,
Page (NA) Paper number 2070
Abstract:
We develop new rules for combining estimates obtained from each classifier
in an ensemble. A variety of combination techniques have been previously
suggested, including averaging probability estimates, as well as hard
voting schemes. We introduce a critic associated with each classifier,
whose objective is to predict the classifier's errors. Since the critic
only tackles a two-class problem, its predictions are generally more
reliable than those of the classifier, and thus can be used as the
basis for our suggested improved combination rules. While previous
techniques are only effective when the individual classifier error
rate is p < 0.5, the new approach is successful, as proved under an
independence assumption, even when this condition is violated -- in
particular, so long as p + q < 1, with q the critic's error rate. More
generally, critic-driven combining achieves consistent, substantial
performance improvement over alternative methods, on a number of benchmark
data sets.
Authors:
Madiha Sabry-Rizk, EEIE Dept, City University, London, UK (U.K.)
Walid A Zgallai, EEIE Dept, City University, London, UK (U.K.)
Sahar El-Khafif, EEIE Dept, City University, London, UK (U.K.)
Ewart R Carson, MIM Centre, City University, London, UK (U.K.)
Kenneth T Grattan, EEIE Dept, City University, London, UK (U.K.)
Peter Thompson, Royal Free Hospital, London, UK (U.K.)
Page (NA) Paper number 1728
Abstract:
The paper describes a simple yet highly accurate multi-layer feed-forward
neural network classifier (based on the back-propagation algorithm)
specifically designed to successfully distinguish between normal and
abnormal higher-order statist ics features of electrocardiogram (ECG)
signals. The concerned abnormality in ECG is associated with ventricular
late potentials (LP's) indicative of life threatening heart diseases.
LP's are defined as signals from areas of delayed conduction which
outla st the normal QRS period (80-100 msec). The QRS along with the
P and T waves constitute the heart beat cycle. This classifier incorporates
both pre-processing and adaptive weight adjustments across the input
layer during the training phase of the network to enhance extraction
of features pertinent to LP's found in 1-d cumulants. The latter is
deemed necessary to offset the low S/N ratio in the cumulant domains
concomitant to performing short data segmentation in order to capture
the LP's transient appeara nce. In this paper we summarize the procedures
of feature selection for neural network training, modification to the
back propagation algorithm to speed its rate of conversion, and the
pilot trial results of the neural ECG classifier.
|