Authors IndexSessionsTechnical programAttendees

 

Session: Large Vocabulary (Language Modeling and Speech Understanding)

Title: A COMPARISON OF FOUR METRICS FOR AUTO-INDUCING SEMANTIC CLASSES

Authors: Andrew Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui Lee

Abstract: A speech understanding system typically includes a natural language understanding module that defines concepts, i.e., groups of semantically related words. It is a challenge to build a set of concepts for a new domain for which prior knowledge and training data are limited. In our work, concepts are induced automatically from unannotated training data by grouping semantically similar words and phrases together into concept classes. Four context-dependent similarity metrics are proposed and their performance for auto-inducing concepts is evaluated. Two of these metrics are based on the Kullback-Leibler (KL) distance measure, a third is the Manhattan norm, and the fourth is the vector product (VP) similarity measure. The KL and VP metrics consistently under-perform the other metrics on the four tasks investigated: movie information, a children's game, travel reservations, and Wall Street Journal news articles. Correct concept classification rates are up to 90 % for the movie task.

a01ap098.ps a01ap098.pdf