Application - Pattern Recognition & Speech Processing

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Frequency Recovery Of Narrow-Band Speech Using Adaptive Spline Neural Networks

Authors:

Aurelio Uncini, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy)
Francesco Gobbi, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy)
Francesco Piazza, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy)

Page (NA) Paper number 1915

Abstract:

In this paper a new system for speech quality enhancement (SQE) is presented. A SQE system attempts to recover the high and low frequencies from a narrow-band speech signal, usually working as a post-processor at the receiver side of a transmission system. The new system operates directly in the frequency domain using complex-valued neural networks. In order to reduce the computational burden and improve the generalization capabilities, a new architecture based on a recently introduced neural network, called adaptive spline neural network (ASNN), is employed. Experimental results demonstrate the effectiveness of the proposed method.

IC991915.PDF (From Author) IC991915.PDF (Rasterized)

TOP


Diphone Multi-Trajectory Subspace Models

Authors:

Klaus Reinhard,
Mahesan Niranjan,

Page (NA) Paper number 1932

Abstract:

In this paper we report on the extension of capturing speech transitions embedded in diphones using trajectory models. The slowly varying dynamics of spectral trajectories carry much discriminant information that is very crudely modelled by traditional approaches such as HMMs. We improved our methodology of explicitly capturing the trajectory of short time spectral parameter vectors introducing multi-trajectory concepts in a probabilistic framework. Optimal subspace selection is presented which finds the most discriminant plane for classification. Using the E-set from the TIMIT database results suggest that discriminant information is preserved in the subspace.

IC991932.PDF (From Author) IC991932.PDF (Rasterized)

TOP


Speaker Recognition With A MLP Classifier And LPCC Codebook

Authors:

Daniel Rodriguez-Porcheron,
Marcos Faúndez-Zanuy,

Page (NA) Paper number 1290

Abstract:

This paper improves the speaker recognition rates of a MLP classifier and LPCC codebook alone, using a linear combination between both methods. In our simulations we have obtained an improvement of 4.7% over a LPCC codebook of 32 vectors and 1.5% for a codebook of 128 vectors (error rate drops from 3.68% to 2.1%). Also we propose an efficient algorithm that reduces the computational complexity of the LPCC-VQ system by a factor of 4.

IC991290.PDF (From Author) IC991290.PDF (Rasterized)

TOP


Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer

Authors:

Holger Schwenk,

Page (NA) Paper number 2368

Abstract:

"Boosting" is a general method for improving the performance of almost any learning algorithm. A recently proposed and very promising boosting algorithm is AdaBoost. In this paper we investigate if AdaBoost can be used to improve a hybrid HMM/neural network continuous speech recognizer. Boosting significantly improves the word error rate from 6.3% to 5.3% on a test set of the OGI Numbers95 corpus, a medium size continuous numbers recognition task. These results compare favorably with other combining techniques using several different feature representations or additional information from longer time spans.

IC992368.PDF (From Author) IC992368.PDF (Rasterized)

TOP


Size Matters: An Empirical Study Of Neural Network Training For Large Vocabulary Continuous Speech Recognition

Authors:

Dan Ellis,
Nelson Morgan,

Page (NA) Paper number 2400

Abstract:

We have trained and tested a number of large neural networks for the purpose of emission probability estimation in large vocabulary continuous speech recognition. In particular, the problem under test is the DARPA Broadcast News task. Our goal here was to determine the relationship between training time, word error rate, size of the training set, and size of the neural network. In all cases, the network architecture was quite simple, comprising a single large hidden layer with an input window consisting of feature vectors from 9 frames around the current time, with a single output for each of 54 phonetic categories. Thus far, simultaneous increases to the size of the training set and the neural network improve performance; in other words, more data helps, as does the training of more parameters. We continue to be surprised that such a simple system works as well as it does for complex tasks. Given a limitation in training time, however, there appears to be an optimal ratio of training patterns to parameters of around 25:1 in these circumstances. Additionally, doubling the training data and system size appears to provide diminishing returns of error rate reduction for the largest systems.

IC992400.PDF (From Author) IC992400.PDF (Rasterized)

TOP


Oriented Soft Localized Subspace Classification

Authors:

Thiagarajan Balachander,
Ravi Kothari,

Page (NA) Paper number 1212

Abstract:

Subspace methods of pattern recognition form an interesting and popular classification paradigm. The earliest subspace method of classification was the CLass Featuring Information Compression (CLAFIC) which associated with each class a linear subspace. Local subspace classification methodologies which have enhanced classification power by associating multiple linear subspaces with each class have also been investigated. In this paper, we introduce the Oriented Soft Regional Subspace Classifier (OS-RSC). The highlights of this classifier are (i) Class specific subspaces are formed to specifically maximize the average projection of one class while minimizing that of the rival class (ii) Multiple manifolds are formed for each class increasing classification power (iii) soft sharing of the training patterns again allows for consistent classification performance. It turns out that the cost function for forming class specific subspaces is maximized for a subspace of unit dimensionality. The performance of the proposed classifier is tested on real-world classification problems.

IC991212.PDF (From Author) IC991212.PDF (Rasterized)

TOP


The Separability Theory Of Hyperbolic Tangent Kernels And Support Vector Machines For Pattern Classification

Authors:

Mathini Sellathurai,
Simon Haykin,

Page (NA) Paper number 1935

Abstract:

In this paper, a new theory is developed for the feature spaces of hyperbolic tangent used as an activation kernel for non-linear support vector machines. The theory developed herein is based on the distinct features of hyperbolic geometry, which leads to an interesting geometrical interpretation of the higher-dimensional feature spaces of neural networks using hyperbolic tangent as the activation function. The new theory is used to explain the seperability of hyperbolic tangent kernels where we show that the seperability is possible only for a certain class of hyperbolic kernels. Simulation results are given supporting the seperability theory developed in this paper.

IC991935.PDF (From Author) IC991935.PDF (Rasterized)

TOP


Multi-Category Classification by Kernel Based Nonlinear Subspace Method

Authors:

Eisaku Maeda,
Hiroshi Murase,

Page (NA) Paper number 1595

Abstract:

The Kernel based Nonlinear Subspace (KNS) method is proposed for multi-class pattern classification. This method consists of the nonlinear transformation of feature spaces defined by kernel functions and subspace method in transformed high-dimensional spaces. The Support Vector Machine, a nonlinear classifier based on a kernel function technique, shows excellent classification performance, however, its computational cost increases exponentially with the number of patterns and classes. The linear subspace method is a technique for multi-category classification, but it fails when the pattern distribution has nonlinear characteristics or the feature space dimension is low compared to the number of classes. The proposed method combines the advantages of both techniques and realizes multi-class nonlinear classifiers with better performance in less computational time. In this paper, we show that a nonlinear subspace method can be formulated by nonlinear transformations defined through kernel functions and that its performance is better than that obtained by conventional methods.

IC991595.PDF (From Author) IC991595.PDF (Rasterized)

TOP


Ensemble Classification by Critic-driven Combining

Authors:

David J Miller,
Lian Yan,

Page (NA) Paper number 2070

Abstract:

We develop new rules for combining estimates obtained from each classifier in an ensemble. A variety of combination techniques have been previously suggested, including averaging probability estimates, as well as hard voting schemes. We introduce a critic associated with each classifier, whose objective is to predict the classifier's errors. Since the critic only tackles a two-class problem, its predictions are generally more reliable than those of the classifier, and thus can be used as the basis for our suggested improved combination rules. While previous techniques are only effective when the individual classifier error rate is p < 0.5, the new approach is successful, as proved under an independence assumption, even when this condition is violated -- in particular, so long as p + q < 1, with q the critic's error rate. More generally, critic-driven combining achieves consistent, substantial performance improvement over alternative methods, on a number of benchmark data sets.

IC992070.PDF (From Author) IC992070.PDF (Rasterized)

TOP


Highly Accurate Higher Order Statistics Based Neural Network Classifier Of Specific Abnormality In Electrocardiogram Signals

Authors:

Madiha Sabry-Rizk, EEIE Dept, City University, London, UK (U.K.)
Walid A Zgallai, EEIE Dept, City University, London, UK (U.K.)
Sahar El-Khafif, EEIE Dept, City University, London, UK (U.K.)
Ewart R Carson, MIM Centre, City University, London, UK (U.K.)
Kenneth T Grattan, EEIE Dept, City University, London, UK (U.K.)
Peter Thompson, Royal Free Hospital, London, UK (U.K.)

Page (NA) Paper number 1728

Abstract:

The paper describes a simple yet highly accurate multi-layer feed-forward neural network classifier (based on the back-propagation algorithm) specifically designed to successfully distinguish between normal and abnormal higher-order statist ics features of electrocardiogram (ECG) signals. The concerned abnormality in ECG is associated with ventricular late potentials (LP's) indicative of life threatening heart diseases. LP's are defined as signals from areas of delayed conduction which outla st the normal QRS period (80-100 msec). The QRS along with the P and T waves constitute the heart beat cycle. This classifier incorporates both pre-processing and adaptive weight adjustments across the input layer during the training phase of the network to enhance extraction of features pertinent to LP's found in 1-d cumulants. The latter is deemed necessary to offset the low S/N ratio in the cumulant domains concomitant to performing short data segmentation in order to capture the LP's transient appeara nce. In this paper we summarize the procedures of feature selection for neural network training, modification to the back propagation algorithm to speed its rate of conversion, and the pilot trial results of the neural ECG classifier.

IC991728.PDF (From Author) IC991728.PDF (Rasterized)

TOP