Session: IMDSP-L1
Time: 1:00 - 3:00, Tuesday, May 8, 2001
Location: Room 251 D
Title: Handwriting and Text Recognition Systems
Chair: Kris Popat

1:00, IMDSP-L1.1
A MULTI-SCALE AND MULTI-ORIENTATION RECOGNITION TECHNIQUE APPLIED TO DOCUMENT INTERPRETATION : APPLICATION TO FRENCH TELEPHONE NETWORK MAPS
F. ROUSSEAU, J. OGIER, C. CARIOU, R. MULLOT, J. LABICHE, J. GARDES, S. ADAM
In this paper, we consider the general problem of technical document interpretation, applied to the documents of the French Telephonic Operator, France Telecom. More precisely, we focus the content of this paper on the computation and the use of a new set of features allowing the classification of multi-oriented and multi-scaled patterns, as well as the rotation and scale parameters estimation. This set of Invariant is based on the Fourier Mellin Transform. This transformation has two interesting properties in the context of our study. The first rely on the excellent classification rate which is obtained with this method. Moreover, it provides a robust estimator of orientation and scale of the processed shapes through the use of the shift theorem of Fourier Transform.

1:20, IMDSP-L1.2
DECODING OF TEXT LINES IN GRAYSCALE DOCUMENT IMAGES
K. POPAT
The Document Image Decoding (DID) framework for recognizing printed text in images has been shown in previous work to achieve extremely high recognition accuracy when its models are well matched to the data. To date, DID has been restricted to binary images, in part for computational reasons, and in part because binary scanning is widely available and often of sufficient spatial resolution to make the use of grayscale information unnecessary for reliable recognition. Advances in computer speed and memory, along with the emergence of low-cost digital still cameras and similar devices as alternatives to traditional scanners, motivates the extension of the DID formalism to the low-spatial-resolution grayscale and color domains. To do so requires substantially generalizing DID's image-formation and degradation models. This paper lays out an approach and presents results on real data.

1:40, IMDSP-L1.3
HIGH PERFORMANCE CHINESE OCR BASED ON GABOR FEATURES, DISCRIMINATIVE FEATURE EXTRACTION AND MODEL TRAINING
Q. HUO, Y. GE, Z. FENG
We've been developing a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kai, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450MHz PC yet only consuming less than 2MB memory. In this paper, we describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training.

2:00, IMDSP-L1.4
HMM TOPOLOGY OPTIMIZATION FOR HANDWRITING RECOGNITION
D. LI, A. BIEM, J. SUBRAHMONIA
This paper addresses the problem of Hidden Markov Model (HMM) topology estimation in the context of on-line handwriting recognition. HMMs have been widely used in applications related to speech and handwriting recognition with great success. One major drawback with these approaches, however, is that the techniques that they use for estimating the topology of the models (number of states, connectivity between the states and the number of Gaussians), are usually heuristically-derived, without optimal certainty. This paper addresses this problem, by comparing a couple of commonly-used heuristically-derived methods to an approach that uses Bayesian Information Criterion (BIC) for computing the optimal topology. Experimental results on discretely-written letters show that using BIC gives comparable results to using heuristic approaches with a model that has nearly 10% fewer parameters.

2:20, IMDSP-L1.5
TOWARD ISLAND-OF-RELIABILITY-DRIVEN VERY-LARGE-VOCABULARY ON-LINE HANDWRITING RECOGNITION USING CHARACTER CONFIDENCE SCORING
J. PITRELLI, J. SUBRAHMONIA, B. MAISON
We explore a novel approach for handwriting recognition tasks whose intrinsic vocabularies are too large to be applied fully as word-set constraints during recognition. Our approach applies word-set constraints, and contends with the fact that some parts of words may be written more recognizably than others. An initial pass is made with an HMM recognizer, without word-set constraints, generating a lattice of character-hypothesis arcs representing likely segmentations of the ink signal. Arc confidence scores are computed using a posteriori probabilities. The most-confidently-recognized characters are matched against the overall vocabulary, to extract a word subset to constrain a second recognition pass. Results show that with an overall vocabulary of 273,000 words, we can limit subsets to 50,000 words and eliminate 29.6% of the word errors made by a one-pass recognizer without word-set constraints, and 11.5% of errors made using a fixed 30,000-word set.

2:40, IMDSP-L1.6
MINIMUM CLASSIFICATION ERROR TRAINING OF HIDDEN MARKOV MODELS FOR HANDWRITING RECOGNITION
A. BIEM
This paper evaluates the application of the Minimum Classification Error (MCE) training to online-handwritten text recognition based on Hidden Markov Models. We describe an allograph-based, character level MCE training aimed at minimizing the character error rate while enabling flexibility in writing style. Experiments on a writer-independent discrete character recognition task covering all alpha-numerical characters and keyboard symbols show that MCE achieves more than 30\% character error rate reduction compared to the baseline Maximum Likelihood-based system.