ICASSP99 Application - Pattern Recognition & Speech Processing

Application - Pattern Recognition & Speech Processing
Home Full List of Titles 1: Speech Processing CELP Coding Large Vocabulary Recognition Speech Analysis and Enhancement Acoustic Modeling I ASR Systems and Applications Topics in Speech Coding Speech Analysis Low Bit Rate Speech Coding I Robust Speech Recognition in Noisy Environments Speaker Recognition Acoustic Modeling II Speech Production and Synthesis Feature Extraction Robust Speech Recognition and Adaptation Low Bit Rate Speech Coding II Speech Understanding Language Modeling I 2: Speech Processing, Audio and Electroacoustics, and Neural Networks Acoustic Modeling III Lexical Issues/Search Speech Understanding and Systems Speech Analysis and Quantization Utterance Verification/Acoustic Modeling Language Modeling II Adaptation /Normalization Speech Enhancement Topics in Speaker and Language Recognition Echo Cancellation and Noise Control Coding Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics Spatial Audio Music Applications Application - Pattern Recognition & Speech Processing Theory & Neural Architecture Signal Separation Application - Image & Nonlinear Signal Processing 3: Signal Processing Theory & Methods I Filter Design and Structures Detection Wavelets Adaptive Filtering: Applications and Implementation Nonlinear Signals and Systems Time/Frequency and Time/Scale Analysis Signal Modeling and Representation Filterbank and Wavelet Applications Source and Signal Separation Filterbanks Emerging Applications and Fast Algorithms Frequency and Phase Estimation Spectral Analysis and Higher Order Statistics Signal Reconstruction Adaptive Filter Analysis Transforms and Statistical Estimation Markov and Bayesian Estimation and Classification 4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks System Identification, Equalization, and Noise Suppression Parameter Estimation Adaptive Filters: Algorithms and Performance DSP Development Tools VLSI Building Blocks DSP Architectures DSP System Design Education Recent Advances in Sampling Theory and Applications Steganography: Information Embedding, Digital Watermarking, and Data Hiding Speech Under Stress Physics-Based Signal Processing DSP Chips, Architectures and Implementations DSP Tools and Rapid Prototyping Communication Technologies Image and Video Technologies Automotive Applications / Industrial Signal Processing Speech and Audio Technologies Defense and Security Applications Biomedical Applications Voice and Media Processing Adaptive Interference Cancellation 5: Communications, Sensor Array and Multichannel Source Coding and Compression Compression and Modulation Channel Estimation and Equalization Blind Multiuser Communications Signal Processing for Communications I CDMA and Space-Time Processing Time-Varying Channels and Self-Recovering Receivers Signal Processing for Communications II Blind CDMA and Multi-Channel Equalization Multicarrier Communications Detection, Classification, Localization, and Tracking Radar and Sonar Signal Processing Array Processing: Direction Finding Array Processing Applications I Blind Identification, Separation, and Equalization Antenna Arrays for Communications Array Processing Applications II 6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education Multimedia Analysis and Retrieval Audio and Video Processing for Multimedia Applications Advanced Techniques in Multimedia Video Compression and Processing Image Coding Transform Techniques Restoration and Estimation Image Analysis Object Identification and Tracking Motion Estimation Medical Imaging Image and Multidimensional Signal Processing Applications I Segmentation Image and Multidimensional Signal Processing Applications II Facial Recognition and Analysis Digital Signal Processing Education Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	Frequency Recovery Of Narrow-Band Speech Using Adaptive Spline Neural Networks Authors: Aurelio Uncini, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy) Francesco Gobbi, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy) Francesco Piazza, Dipartimento di Elettronica e Automatica - Università di Ancona, Ancona, Italy (Italy) Page (NA) Paper number 1915 Abstract: In this paper a new system for speech quality enhancement (SQE) is presented. A SQE system attempts to recover the high and low frequencies from a narrow-band speech signal, usually working as a post-processor at the receiver side of a transmission system. The new system operates directly in the frequency domain using complex-valued neural networks. In order to reduce the computational burden and improve the generalization capabilities, a new architecture based on a recently introduced neural network, called adaptive spline neural network (ASNN), is employed. Experimental results demonstrate the effectiveness of the proposed method. IC991915.PDF (From Author) IC991915.PDF (Rasterized) TOP Diphone Multi-Trajectory Subspace Models Authors: Klaus Reinhard, Mahesan Niranjan, Page (NA) Paper number 1932 Abstract: In this paper we report on the extension of capturing speech transitions embedded in diphones using trajectory models. The slowly varying dynamics of spectral trajectories carry much discriminant information that is very crudely modelled by traditional approaches such as HMMs. We improved our methodology of explicitly capturing the trajectory of short time spectral parameter vectors introducing multi-trajectory concepts in a probabilistic framework. Optimal subspace selection is presented which finds the most discriminant plane for classification. Using the E-set from the TIMIT database results suggest that discriminant information is preserved in the subspace. IC991932.PDF (From Author) IC991932.PDF (Rasterized) TOP Speaker Recognition With A MLP Classifier And LPCC Codebook Authors: Daniel Rodriguez-Porcheron, Marcos Faúndez-Zanuy, Page (NA) Paper number 1290 Abstract: This paper improves the speaker recognition rates of a MLP classifier and LPCC codebook alone, using a linear combination between both methods. In our simulations we have obtained an improvement of 4.7% over a LPCC codebook of 32 vectors and 1.5% for a codebook of 128 vectors (error rate drops from 3.68% to 2.1%). Also we propose an efficient algorithm that reduces the computational complexity of the LPCC-VQ system by a factor of 4. IC991290.PDF (From Author) IC991290.PDF (Rasterized) TOP Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer Authors: Holger Schwenk, Page (NA) Paper number 2368 Abstract: "Boosting" is a general method for improving the performance of almost any learning algorithm. A recently proposed and very promising boosting algorithm is AdaBoost. In this paper we investigate if AdaBoost can be used to improve a hybrid HMM/neural network continuous speech recognizer. Boosting significantly improves the word error rate from 6.3% to 5.3% on a test set of the OGI Numbers95 corpus, a medium size continuous numbers recognition task. These results compare favorably with other combining techniques using several different feature representations or additional information from longer time spans. IC992368.PDF (From Author) IC992368.PDF (Rasterized) TOP Size Matters: An Empirical Study Of Neural Network Training For Large Vocabulary Continuous Speech Recognition Authors: Dan Ellis, Nelson Morgan, Page (NA) Paper number 2400 Abstract: We have trained and tested a number of large neural networks for the purpose of emission probability estimation in large vocabulary continuous speech recognition. In particular, the problem under test is the DARPA Broadcast News task. Our goal here was to determine the relationship between training time, word error rate, size of the training set, and size of the neural network. In all cases, the network architecture was quite simple, comprising a single large hidden layer with an input window consisting of feature vectors from 9 frames around the current time, with a single output for each of 54 phonetic categories. Thus far, simultaneous increases to the size of the training set and the neural network improve performance; in other words, more data helps, as does the training of more parameters. We continue to be surprised that such a simple system works as well as it does for complex tasks. Given a limitation in training time, however, there appears to be an optimal ratio of training patterns to parameters of around 25:1 in these circumstances. Additionally, doubling the training data and system size appears to provide diminishing returns of error rate reduction for the largest systems. IC992400.PDF (From Author) IC992400.PDF (Rasterized) TOP Oriented Soft Localized Subspace Classification Authors: Thiagarajan Balachander, Ravi Kothari, Page (NA) Paper number 1212 Abstract: Subspace methods of pattern recognition form an interesting and popular classification paradigm. The earliest subspace method of classification was the CLass Featuring Information Compression (CLAFIC) which associated with each class a linear subspace. Local subspace classification methodologies which have enhanced classification power by associating multiple linear subspaces with each class have also been investigated. In this paper, we introduce the Oriented Soft Regional Subspace Classifier (OS-RSC). The highlights of this classifier are (i) Class specific subspaces are formed to specifically maximize the average projection of one class while minimizing that of the rival class (ii) Multiple manifolds are formed for each class increasing classification power (iii) soft sharing of the training patterns again allows for consistent classification performance. It turns out that the cost function for forming class specific subspaces is maximized for a subspace of unit dimensionality. The performance of the proposed classifier is tested on real-world classification problems. IC991212.PDF (From Author) IC991212.PDF (Rasterized) TOP The Separability Theory Of Hyperbolic Tangent Kernels And Support Vector Machines For Pattern Classification Authors: Mathini Sellathurai, Simon Haykin, Page (NA) Paper number 1935 Abstract: In this paper, a new theory is developed for the feature spaces of hyperbolic tangent used as an activation kernel for non-linear support vector machines. The theory developed herein is based on the distinct features of hyperbolic geometry, which leads to an interesting geometrical interpretation of the higher-dimensional feature spaces of neural networks using hyperbolic tangent as the activation function. The new theory is used to explain the seperability of hyperbolic tangent kernels where we show that the seperability is possible only for a certain class of hyperbolic kernels. Simulation results are given supporting the seperability theory developed in this paper. IC991935.PDF (From Author) IC991935.PDF (Rasterized) TOP Multi-Category Classification by Kernel Based Nonlinear Subspace Method Authors: Eisaku Maeda, Hiroshi Murase, Page (NA) Paper number 1595 Abstract: The Kernel based Nonlinear Subspace (KNS) method is proposed for multi-class pattern classification. This method consists of the nonlinear transformation of feature spaces defined by kernel functions and subspace method in transformed high-dimensional spaces. The Support Vector Machine, a nonlinear classifier based on a kernel function technique, shows excellent classification performance, however, its computational cost increases exponentially with the number of patterns and classes. The linear subspace method is a technique for multi-category classification, but it fails when the pattern distribution has nonlinear characteristics or the feature space dimension is low compared to the number of classes. The proposed method combines the advantages of both techniques and realizes multi-class nonlinear classifiers with better performance in less computational time. In this paper, we show that a nonlinear subspace method can be formulated by nonlinear transformations defined through kernel functions and that its performance is better than that obtained by conventional methods. IC991595.PDF (From Author) IC991595.PDF (Rasterized) TOP Ensemble Classification by Critic-driven Combining Authors: David J Miller, Lian Yan, Page (NA) Paper number 2070 Abstract: We develop new rules for combining estimates obtained from each classifier in an ensemble. A variety of combination techniques have been previously suggested, including averaging probability estimates, as well as hard voting schemes. We introduce a critic associated with each classifier, whose objective is to predict the classifier's errors. Since the critic only tackles a two-class problem, its predictions are generally more reliable than those of the classifier, and thus can be used as the basis for our suggested improved combination rules. While previous techniques are only effective when the individual classifier error rate is p < 0.5, the new approach is successful, as proved under an independence assumption, even when this condition is violated -- in particular, so long as p + q < 1, with q the critic's error rate. More generally, critic-driven combining achieves consistent, substantial performance improvement over alternative methods, on a number of benchmark data sets. IC992070.PDF (From Author) IC992070.PDF (Rasterized) TOP Highly Accurate Higher Order Statistics Based Neural Network Classifier Of Specific Abnormality In Electrocardiogram Signals Authors: Madiha Sabry-Rizk, EEIE Dept, City University, London, UK (U.K.) Walid A Zgallai, EEIE Dept, City University, London, UK (U.K.) Sahar El-Khafif, EEIE Dept, City University, London, UK (U.K.) Ewart R Carson, MIM Centre, City University, London, UK (U.K.) Kenneth T Grattan, EEIE Dept, City University, London, UK (U.K.) Peter Thompson, Royal Free Hospital, London, UK (U.K.) Page (NA) Paper number 1728 Abstract: The paper describes a simple yet highly accurate multi-layer feed-forward neural network classifier (based on the back-propagation algorithm) specifically designed to successfully distinguish between normal and abnormal higher-order statist ics features of electrocardiogram (ECG) signals. The concerned abnormality in ECG is associated with ventricular late potentials (LP's) indicative of life threatening heart diseases. LP's are defined as signals from areas of delayed conduction which outla st the normal QRS period (80-100 msec). The QRS along with the P and T waves constitute the heart beat cycle. This classifier incorporates both pre-processing and adaptive weight adjustments across the input layer during the training phase of the network to enhance extraction of features pertinent to LP's found in 1-d cumulants. The latter is deemed necessary to offset the low S/N ratio in the cumulant domains concomitant to performing short data segmentation in order to capture the LP's transient appeara nce. In this paper we summarize the procedures of feature selection for neural network training, modification to the back propagation algorithm to speed its rate of conversion, and the pilot trial results of the neural ECG classifier. IC991728.PDF (From Author) IC991728.PDF (Rasterized) TOP