Home
 Mirror Sites
 General Information
 Confernce Schedule
 Technical Program
 Tutorials
 Industry Technology Tracks
 Exhibits
 Sponsors
 Registration
 Coming to Phoenix
 Call for Papers
 Author's Kit
 On-line Review
 Future Conferences
 Help
|
Abstract: Session NNSP-1 |
|
NNSP-1.1
|
FREQUENCY RECOVERY OF NARROW-BAND SPEECH USING ADAPTIVE SPLINE NEURAL NETWORKS
Aurelio Uncini,
Francesco Gobbi,
Francesco Piazza (Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy)
In this paper a new system for speech quality enhancement (SQE) is presented. A SQE system attempts to recover the high and low frequencies from a narrow-band speech signal, usually working as a post-processor at the receiver side of a transmission system. The new system operates directly in the frequency domain using complex-valued neural networks. In order to reduce the computational burden and improve the generalization capabilities, a new architecture based on a recently introduced neural network, called adaptive spline neural network (ASNN), is employed. Experimental results demonstrate the effectiveness of the proposed method.
|
NNSP-1.2
|
DIPHONE MULTI-TRAJECTORY SUBSPACE MODELS
KLAUS REINHARD (CAMBRIDGE UNIVERSITY ENGINEERING DEPARTMENT),
MAHESAN NIRANJAN (CAMBRIDGE UNIVERSITY ENGINEERING DEPARTMEN)
In this paper we report on the extension of capturing speech transitions
embedded in diphones using trajectory models. The slowly varying dynamics
of spectral trajectories carry much discriminant information
that is very crudely modelled by traditional approaches such as
HMMs. We improved our methodology of explicitly
capturing the trajectory of short time spectral parameter
vectors introducing multi-trajectory concepts in a probabilistic
framework. Optimal subspace selection is presented which finds the
most discriminant plane for classification.
Using the E-set from the TIMIT database results suggest
that discriminant information is preserved in the subspace.
|
NNSP-1.3
|
Speaker recognition with a MLP classifier and LPCC codebook
Daniel Rodriguez-Porcheron (Universitat Politecnica de Catalunya),
Marcos Faundez-Zanuy (Escola Universitaria Politecnica de Mataro)
This paper improves the speaker recognition rates of a MLP classifier and LPCC codebook alone, using a linear combination between both methods. In our simulations we have obtained an improvement of 4.7% over a LPCC codebook of 32 vectors and 1.5% for a codebook of 128 vectors (error rate drops from 3.68% to 2.1%). Also we propose an efficient algorithm that reduces the computational complexity of the LPCC-VQ system by a factor of 4.
|
NNSP-1.4
|
Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer
Holger Schwenk (International Computer Science Institute, Berkeley)
"Boosting" is a general method for improving the performance
of almost any learning algorithm. A recently proposed and very promising
boosting algorithm is AdaBoost.
In this paper we investigate if AdaBoost can be used to improve
a hybrid HMM/neural network continuous speech recognizer.
Boosting significantly improves the word error rate from 6.3% to 5.3%
on a test set of the OGI Numbers95 corpus, a medium size continuous
numbers recognition task.
These results compare favorably with other combining techniques using
several different feature representations or additional information from
longer time spans.
|
NNSP-1.5
|
Size matters: An empirical study of neural network training for large vocabulary continuous speech recognition
Dan Ellis (International Computer Science Institute),
Nelson Morgan (International Computer Science Institute / U.C. Berkeley)
We have trained and tested a number of large neural networks for the
purpose of emission probability estimation in large vocabulary
continuous speech recognition. In particular, the problem under test
is the DARPA Broadcast News task. Our goal here was to determine the
relationship between training time, word error rate, size of the
training set, and size of the neural network. In all cases, the
network architecture was quite simple, comprising a single large
hidden layer with an input window consisting of feature vectors from 9
frames around the current time, with a single output for each of 54
phonetic categories. Thus far, simultaneous increases to the size of
the training set and the neural network improve performance; in other
words, more data helps, as does the training of more parameters. We
continue to be surprised that such a simple system works as well as it
does for complex tasks. Given a limitation in training time, however,
there appears to be an optimal ratio of training patterns to
parameters of around 25:1 in these circumstances. Additionally,
doubling the training data and system size appears to provide
diminishing returns of error rate reduction for the largest systems.
|
NNSP-1.6
|
Oriented Soft Localized Subspace Classification
Thiagarajan Balachander,
Ravi Kothari (University of Cincinnati)
Subspace methods of pattern recognition form an
interesting and popular classification paradigm. The
earliest subspace method of classification was the
CLass Featuring Information Compression (CLAFIC)
which associated with each class a linear
subspace. Local subspace classification
methodologies which have enhanced classification
power by associating multiple linear
subspaces with each class have also been
investigated. In this paper, we introduce the
Oriented Soft Regional Subspace Classifier (OS-RSC). The highlights of this classifier are (i) Class
specific subspaces are formed to specifically
maximize the average projection of one class while
minimizing that of the rival class (ii) Multiple
manifolds are formed
for each class increasing classification power (iii)
soft sharing of the training patterns again allows
for consistent classification performance. It turns
out that the cost function for forming class
specific subspaces is maximized for a subspace of
unit dimensionality.
The performance of the proposed classifier is
tested on real-world classification problems.
|
NNSP-1.7
|
The seperability theory of hyperbolic tangent kernels and support vector machines for pattern classification
Mathini Sellathurai (Communications Research Lab, McMaster University),
Simon Haykin (Communications research lab, McMaster University)
In this paper, a new theory is developed for the
feature spaces of hyperbolic tangent used as an
activation kernel for non-linear support vector
machines. The theory developed herein is based on
the distinct features of hyperbolic geometry,
which leads to an interesting geometrical
interpretation of the higher-dimensional feature
spaces of neural networks using hyperbolic tangent
as the activation function. The new theory is used to
explain the seperability of hyperbolic tangent kernels
where we show that the seperability is possible only for
a certain class of hyperbolic kernels. Simulation
results are given supporting the seperability theory
developed in this paper.
|
NNSP-1.8
|
Multi-Category Classification by Kernel based Nonlinear Subspace Method
Eisaku Maeda,
Hiroshi Murase (NTT Basic Research Laboratories)
The Kernel based Nonlinear Subspace (KNS) method is proposed for
multi-class pattern classification. This method consists of the
nonlinear transformation of feature spaces defined by kernel functions
and subspace method in transformed high-dimensional spaces. The
Support Vector Machine, a nonlinear classifier based on a kernel
function technique, shows excellent classification performance,
however, its computational cost increases exponentially with the
number of patterns and classes. The linear subspace method is a
technique for multi-category classification, but it fails when the
pattern distribution has nonlinear characteristics or the feature
space dimension is low compared to the number of classes. The
proposed method combines the advantages of both techniques and
realizes multi-class nonlinear classifiers with better performance in
less computational time. In this paper, we show that a nonlinear
subspace method can be formulated by nonlinear transformations defined
through kernel functions and that its performance is better than that
obtained by conventional methods.
|
NNSP-1.9
|
Ensemble Classification by Critic-driven Combining
David J Miller,
Lian Yan (The Pennsylvania State University)
We develop new rules for combining estimates obtained
from each classifier in an ensemble. A variety of
combination techniques have been previously suggested,
including averaging probability estimates, as well as
hard voting schemes. We introduce a critic associated
with each classifier, whose objective is to predict the
classifier's errors. Since the critic only tackles a
two-class problem, its predictions are generally more
reliable than those of the classifier, and thus can
be used as the basis for our suggested improved
combination rules. While previous techniques are only
effective when the individual classifier error rate is
p < 0.5, the new approach is successful, as proved
under an independence assumption, even when this
condition is violated -- in particular, so long as
p + q < 1, with q the critic's error rate. More
generally, critic-driven combining achieves consistent,
substantial performance improvement over alternative
methods, on a number of benchmark data sets.
|
NNSP-1.10
|
HIGHLY ACCURATE HIGHER ORDER STATISTICS BASED NEURAL NETWORK
CLASSIFIER OF SPECIFIC ABNORMALITY IN ELECTROCARDIOGRAM SIGNALS
Madiha Sabry-Rizk,
Walid A Zgallai,
Sahar El-Khafif (EEIE Dept, City University, London, UK),
Ewart R Carson (MIM Centre, City University, London, UK),
Kenneth T Grattan (EEIE Dept, City University, London, UK),
Peter Thompson (Royal Free Hospital, London, UK)
The paper describes a simple yet highly accurate
multi-layer feed-forward neural network classifier (based on the
back-propagation algorithm) specifically designed to successfully
distinguish between normal and abnormal higher-order statist ics features
of electrocardiogram (ECG) signals. The concerned abnormality in ECG is
associated with ventricular late potentials (LP's) indicative of life
threatening heart diseases. LP's are defined as signals from areas of
delayed conduction which outla st the normal QRS period (80-100 msec). The
QRS along with the P and T waves constitute the heart beat cycle. This
classifier incorporates both pre-processing and adaptive weight
adjustments across the input layer during the training phase of the
network to enhance extraction of features pertinent to LP's found in 1-d
cumulants. The latter is deemed necessary to offset the low S/N ratio in
the cumulant domains concomitant to performing short data segmentation in
order to capture the LP's transient appeara nce. In this paper we
summarize the procedures of feature selection for neural network training,
modification to the back propagation algorithm to speed its rate of
conversion, and the pilot trial results of the neural ECG classifier.
|
|