NNSP-1.1

FREQUENCY RECOVERY OF NARROW-BAND SPEECH USING ADAPTIVE SPLINE NEURAL NETWORKS
Aurelio Uncini, Francesco Gobbi, Francesco Piazza (Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy)

In this paper a new system for speech quality enhancement (SQE) is presented. A SQE system attempts to recover the high and low frequencies from a narrow-band speech signal, usually working as a post-processor at the receiver side of a transmission system. The new system operates directly in the frequency domain using complex-valued neural networks. In order to reduce the computational burden and improve the generalization capabilities, a new architecture based on a recently introduced neural network, called adaptive spline neural network (ASNN), is employed. Experimental results demonstrate the effectiveness of the proposed method.

NNSP-1.2

DIPHONE MULTI-TRAJECTORY SUBSPACE MODELS
KLAUS REINHARD (CAMBRIDGE UNIVERSITY ENGINEERING DEPARTMENT), MAHESAN NIRANJAN (CAMBRIDGE UNIVERSITY ENGINEERING DEPARTMEN)

In this paper we report on the extension of capturing speech transitions embedded in diphones using trajectory models. The slowly varying dynamics of spectral trajectories carry much discriminant information that is very crudely modelled by traditional approaches such as HMMs. We improved our methodology of explicitly capturing the trajectory of short time spectral parameter vectors introducing multi-trajectory concepts in a probabilistic framework. Optimal subspace selection is presented which finds the most discriminant plane for classification. Using the E-set from the TIMIT database results suggest that discriminant information is preserved in the subspace.

NNSP-1.3

Speaker recognition with a MLP classifier and LPCC codebook
Daniel Rodriguez-Porcheron (Universitat Politecnica de Catalunya), Marcos Faundez-Zanuy (Escola Universitaria Politecnica de Mataro)

This paper improves the speaker recognition rates of a MLP classifier and LPCC codebook alone, using a linear combination between both methods. In our simulations we have obtained an improvement of 4.7% over a LPCC codebook of 32 vectors and 1.5% for a codebook of 128 vectors (error rate drops from 3.68% to 2.1%). Also we propose an efficient algorithm that reduces the computational complexity of the LPCC-VQ system by a factor of 4.

NNSP-1.4

Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer
Holger Schwenk (International Computer Science Institute, Berkeley)

"Boosting" is a general method for improving the performance of almost any learning algorithm. A recently proposed and very promising boosting algorithm is AdaBoost. In this paper we investigate if AdaBoost can be used to improve a hybrid HMM/neural network continuous speech recognizer. Boosting significantly improves the word error rate from 6.3% to 5.3% on a test set of the OGI Numbers95 corpus, a medium size continuous numbers recognition task. These results compare favorably with other combining techniques using several different feature representations or additional information from longer time spans.

NNSP-1.5

Size matters: An empirical study of neural network training for large vocabulary continuous speech recognition
Dan Ellis (International Computer Science Institute), Nelson Morgan (International Computer Science Institute / U.C. Berkeley)

We have trained and tested a number of large neural networks for the purpose of emission probability estimation in large vocabulary continuous speech recognition. In particular, the problem under test is the DARPA Broadcast News task. Our goal here was to determine the relationship between training time, word error rate, size of the training set, and size of the neural network. In all cases, the network architecture was quite simple, comprising a single large hidden layer with an input window consisting of feature vectors from 9 frames around the current time, with a single output for each of 54 phonetic categories. Thus far, simultaneous increases to the size of the training set and the neural network improve performance; in other words, more data helps, as does the training of more parameters. We continue to be surprised that such a simple system works as well as it does for complex tasks. Given a limitation in training time, however, there appears to be an optimal ratio of training patterns to parameters of around 25:1 in these circumstances. Additionally, doubling the training data and system size appears to provide diminishing returns of error rate reduction for the largest systems.

NNSP-1.6

Oriented Soft Localized Subspace Classification
Thiagarajan Balachander, Ravi Kothari (University of Cincinnati)

Subspace methods of pattern recognition form an interesting and popular classification paradigm. The earliest subspace method of classification was the CLass Featuring Information Compression (CLAFIC) which associated with each class a linear subspace. Local subspace classification methodologies which have enhanced classification power by associating multiple linear subspaces with each class have also been investigated. In this paper, we introduce the Oriented Soft Regional Subspace Classifier (OS-RSC). The highlights of this classifier are (i) Class specific subspaces are formed to specifically maximize the average projection of one class while minimizing that of the rival class (ii) Multiple manifolds are formed for each class increasing classification power (iii) soft sharing of the training patterns again allows for consistent classification performance. It turns out that the cost function for forming class specific subspaces is maximized for a subspace of unit dimensionality. The performance of the proposed classifier is tested on real-world classification problems.

NNSP-1.7

The seperability theory of hyperbolic tangent kernels and support vector machines for pattern classification
Mathini Sellathurai (Communications Research Lab, McMaster University), Simon Haykin (Communications research lab, McMaster University)

In this paper, a new theory is developed for the feature spaces of hyperbolic tangent used as an activation kernel for non-linear support vector machines. The theory developed herein is based on the distinct features of hyperbolic geometry, which leads to an interesting geometrical interpretation of the higher-dimensional feature spaces of neural networks using hyperbolic tangent as the activation function. The new theory is used to explain the seperability of hyperbolic tangent kernels where we show that the seperability is possible only for a certain class of hyperbolic kernels. Simulation results are given supporting the seperability theory developed in this paper.

NNSP-1.8

Multi-Category Classification by Kernel based Nonlinear Subspace Method
Eisaku Maeda, Hiroshi Murase (NTT Basic Research Laboratories)

The Kernel based Nonlinear Subspace (KNS) method is proposed for multi-class pattern classification. This method consists of the nonlinear transformation of feature spaces defined by kernel functions and subspace method in transformed high-dimensional spaces. The Support Vector Machine, a nonlinear classifier based on a kernel function technique, shows excellent classification performance, however, its computational cost increases exponentially with the number of patterns and classes. The linear subspace method is a technique for multi-category classification, but it fails when the pattern distribution has nonlinear characteristics or the feature space dimension is low compared to the number of classes. The proposed method combines the advantages of both techniques and realizes multi-class nonlinear classifiers with better performance in less computational time. In this paper, we show that a nonlinear subspace method can be formulated by nonlinear transformations defined through kernel functions and that its performance is better than that obtained by conventional methods.

NNSP-1.9

Ensemble Classification by Critic-driven Combining
David J Miller, Lian Yan (The Pennsylvania State University)

We develop new rules for combining estimates obtained from each classifier in an ensemble. A variety of combination techniques have been previously suggested, including averaging probability estimates, as well as hard voting schemes. We introduce a critic associated with each classifier, whose objective is to predict the classifier's errors. Since the critic only tackles a two-class problem, its predictions are generally more reliable than those of the classifier, and thus can be used as the basis for our suggested improved combination rules. While previous techniques are only effective when the individual classifier error rate is p < 0.5, the new approach is successful, as proved under an independence assumption, even when this condition is violated -- in particular, so long as p + q < 1, with q the critic's error rate. More generally, critic-driven combining achieves consistent, substantial performance improvement over alternative methods, on a number of benchmark data sets.

NNSP-1.10

HIGHLY ACCURATE HIGHER ORDER STATISTICS BASED NEURAL NETWORK CLASSIFIER OF SPECIFIC ABNORMALITY IN ELECTROCARDIOGRAM SIGNALS
Madiha Sabry-Rizk, Walid A Zgallai, Sahar El-Khafif (EEIE Dept, City University, London, UK), Ewart R Carson (MIM Centre, City University, London, UK), Kenneth T Grattan (EEIE Dept, City University, London, UK), Peter Thompson (Royal Free Hospital, London, UK)

The paper describes a simple yet highly accurate multi-layer feed-forward neural network classifier (based on the back-propagation algorithm) specifically designed to successfully distinguish between normal and abnormal higher-order statist ics features of electrocardiogram (ECG) signals. The concerned abnormality in ECG is associated with ventricular late potentials (LP's) indicative of life threatening heart diseases. LP's are defined as signals from areas of delayed conduction which outla st the normal QRS period (80-100 msec). The QRS along with the P and T waves constitute the heart beat cycle. This classifier incorporates both pre-processing and adaptive weight adjustments across the input layer during the training phase of the network to enhance extraction of features pertinent to LP's found in 1-d cumulants. The latter is deemed necessary to offset the low S/N ratio in the cumulant domains concomitant to performing short data segmentation in order to capture the LP's transient appeara nce. In this paper we summarize the procedures of feature selection for neural network training, modification to the back propagation algorithm to speed its rate of conversion, and the pilot trial results of the neural ECG classifier.

NNSP-2 >

Last Update: February 4, 1999 Ingo Höntsch