Chair: Richard P. Lippman, MIT Lincoln Laboratory (USA)
Arnaldo J. Abrantes, INESC (PORTUGAL)
Jorge S. Marques, INESC (PORTUGAL)
Snakes, elastic nets, and Kohonen networks are well known algorithms which were developed in different contexts. However, these algorithms share common features allowing us to ask what is the relationship among them and suggesting their use in problems which have traditionally been tackled by only one of them. This paper addresses the problem of edge linking and proposes a new class of non-linear recursive algorithms, based on a general cost function, which includes snakes, Kohonen maps, and elastic nets as special cases. This class provides an unified framework for several existing algorithms in Pattern Recognition and Active Contours and allows the design of new recursive schemes.
Lawrence W. Cychosz, University of Wisconsin (USA)
Jun Zhang, University of Wisconsin (USA)
Two new face recognition systems are proposed using auto-associative back propagation neural network feature extractors on facial regions in conjunction with key facial structure measurements. A third proposed system combines the first two systems using confidence measurements to select a best match. A Cottrell/Fleming face recognition network and a structural face data network are also implemented and evaluated. A training set of 60 images and a test set of 12 images, acquired under uncontrolled conditions, were used to evaluate system performances. The first two proposed systems correctly selected 67 percent of the training images when presented with the test image set. The third proposed system achieved a recognition rate of 75 percent. By comparison, the Cottrell/Fleming network and the structural data network achieved recognition rates of 25 percent and 8 percent, respectively.
Terry McElroy, Raytheon Company (USA)
Elizabeth Wilson, Raytheon Company (USA)
Gretel Anspach, Raytheon Company (USA)
There is a pressing need for sign language to English translation capability to supplement the shortage of sign language interpreters and to provide an aid for training. A modular hybrid design is underway to apply various techniques, including neural networks, in the development of a translation system that can facilitate communication between deaf and hearing people as part of an overall system to automatically translate American Sign Language to spoken English. The key features to be analyzed are hand motion, hand location with respect to the body, and handshape. In this paper, a neural network is used to recognize and classify alphanumeric handshapes using Fourier Descriptor coefficients as an input vector. The algorithm is described and results shown for applying this technique to experimental images.
Hideyuki Watanbe, ATR-ITL (JAPAN)
Tsuyoshi Yamaguchi, ATR-ITL (JAPAN)
Shigeru Katagiri, ATR-ITL (JAPAN)
This paper proposes a new approach, named Discriminative Metric Design (DMD), to pattern recognition. DMD optimizes discriminant functions with the Minimum Classification Error/Generalized Probabilistic Descent method (MCE/GPD) such that intrinsic features of each pattern class can be represented efficiently. Resulting metrics accordingly lead to robust recognizers. DMD is quite general. Several existing methods, such as Learning Vector Quantization and the Continuous Hidden Markov Model, are defined as its special cases. Among many possibilities, the paper specially elaborates the DMD formulation for the quadratic discriminant function, and clearly demonstrates its utility in a speaker-independent Japanese vowel recognition task.
Ying Zhao, BBN Systems and Technologies (USA)
Richard Schwartz, BBN Systems and Technologies (USA)
Jason Sroka, Massachusetts Institute of Technology (USA)
John Makhoul, BBN Systems and Technologies (USA)
In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan, into an HMM-based continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each ``expert'' or neural net performs a ``soft'' decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,000-word and 40,000-word Wall Street Journal corpus using HME models.
Bassel Solaiman, E.N.S.T. Bretagne
Eric P. Maillard, S.T.S.N./G.E.S.M.A. (FRANCE)
We apply a new neural network: HLVQ combining supervised and unsupervised learning to vector quantization. A supervised learning based on Learning Vector Quantization 2 performs attention focusing over a background of a Self- Organizing Feature Map algorithm. It exhibits the salient features of both algorithms: the topology-preserving mapping characteristic is acquired through unsupervised learning while supervised learning keeps the overlap between classes to a minimum. Pattern labeling is carried out by a separate unsupervised network taking as input the discrete cosine transform of a pattern. First the labelling network is trained on the transform of sub-images. Each neuron of this network is considered as the prototype of one class. Once convergence is achieved, HLVQ is trained. Each sub-image is input to the network. The class of the input pattern is determined by the most activated neuron of the labelling network on the presentation of the sub- image transform.
Changyi Sun, Utah State University
Heng-Da Cheng, Utah State University
Jeffery J. McDonnell, SUNY
Christopher M.U. Neale, Utah State University (USA)
The Special Sensor Microwave/Imager (SSM/I) radiometer is practical in monitoring snow conditions for its sensitive response to the changes in snow properties. A single-hidden-layer artificial neural network (ANN) was employed to accomplish this remote sensing task, with radiometric observations of brightness temperatures (Tb's) as input data, to derive information about snow. Error back-propagation learning was applied to train the ANN. After learning the mapping of SSM/I Tb's to snow classes, ANN approach showed a significant promise for identifying mountainous snow conditions. Error rates were 3% for snow-free, 5% for dry snow, 9% for wet snow, and 0% for refrozen snow, respectively. This study indicates the potential of ANN supervised learning for the inversion of snow conditions from SSM/I observations. Further improvement on the application of ANN for large-scale snow monitoring can be expected by using more training data derived from both plains and mountain regions.
Chien-Hsien Wu, National Cheng Kung University (REPUBLIC OF CHINA)
Ching-Wen Lo, National Cheng Kung University (REPUBLIC OF CHINA)
Jhing-Fa Wang, National Cheng Kung University (REPUBLIC OF CHINA)
This paper describes a Computer-aided Heart sound Analysis and Classification System (CHACS) based on neural networks and time analysis. In this system, two subsystems in both time and frequency domains are proposed. In the first subsystem, a multi-layer perceptron neural network is adopted to classify heart sound patterns. In the second subsystem, a set of heuristic rules is used to characterize heart sounds. The individual classification results of these two subsystems are combined to give the final suggestion. Using this system, heart sounds can be selectively stored, retrieved, enhanced, and replayed. Besides, CHACS provides an on-line display of the heart beat rate and allows an objective and reliable classification of heart sounds. Experimental results show that a classification rate of 95.6% is obtained.
R.I. Damper, University of Southampton (UK)
Responses of both human and animal listeners to synthetic stop-consonant/vowel stimuli in which voice-onset time (VOT) is uniformly varied are known to be `categorical' but an explanation of this phenomenon remains elusive. A `composite' model consisting of a physiologically-realistic auditory model feeding its patterns of neural firing to an artificial neural network is described. When trained by (supervised) error back-propagation on the extreme, endpoints of the VOT continuum, the composite model is capable of reproducing closely listeners' behaviour in classical categorical-perception (CP) studies. However, whether the model also reproduces the so-called boundary-shift phenomenon -- whereby the phoneme boundary moves with place of articulation -- apparently depends upon precise details of the auditory model and so, by implication, upon subtle aspects of peripheral auditory processing. A first attempt at unsupervised training has been unsuccessful: the likely reason for this is outlined. It is anticipated that future work comparing the model's responses for unsupervised versus supervised training will help to elucidate the mechanisms of categorical perception.