Chair: Mark A. Clements, Georgia Institute of Technology, USA
Eric Brill, Johns Hopkins University (U.S.A.)
In order to continue building systems with progressively more complex natural language capabilities, it is crucial that great strides are made toward solving the core linguistic analysis problems for complex and possibly unrestricted domains. A great deal of progress has been made by applying machine learning techniques to automatically train systems from manually annotated corpora to provide detailed linguistic analyses to sentences. This paper examines a number of issues within this paradigm of automatic linguistic knowledge acquisition and how they relate to pushing progress in the field of natural language processing over the next decade.
Ronald A. Cole, Oregon Graduate Institute of Science and Technology (U.S.A.)
Stephen Sutton, Oregon Graduate Institute of Science and Technology (U.S.A.)
Yonghong Yan, Oregon Graduate Institute of Science and Technology (U.S.A.)
Pieter Vermeulen, Oregon Graduate Institute of Science and Technology (U.S.A.)
Mark Fanty, Oregon Graduate Institute of Science and Technology (U.S.A.)
In this paper, we argue for a paradigm shift in spoken language technology, from transcription tasks to interactive systems. The current paradigm evaluates speech recognition accuracy on large vocabulary transcription tasks, such as telephone conversations or media broadcasts. Systems are evaluated in international competitions, with strict rules for participation and well-defined evaluation metrics. Participation in these competitions is limited to a few elite laboratories that have the resources to develop and field systems. We propose a new, more productive and more accessible paradigm for spoken language research, in which research advances are evaluated in the context of interactive systems that allow people to perform useful tasks, such as accessing information from the World Wide Web, while driving a car. These systems are made available for daily use by ordinary citizens through telephone networks or placement in easily accessible kiosks in public institutions. It is argued [1,2,3] that this new paradigm, which focuses on the goal of universal access to information for all people, better serves the needs of the research community, as well as the welfare of our citizens. We discuss the challenges and rewards of an interactive system approach to spoken language research, and discuss our initial attempts to stimulate a paradigm shift and engage a large community of researchers through free distribution of the CSLU Toolkit.
Steven Greenberg, International Computer Science Institute (U.S.A.)
Automatic speech recognition in the twenty-first century will strive to emulate many properties of human speech understanding that currently lie beyond the capability of present-day systems. Such future-generation recognition will require massive amounts of empirical data in order to derive the organizational principles underlying the generation and decoding of spoken language. Such data can be efficiently collected through systematic computational experimentation designed to identify the important building blocks of speech and delineate the nature of the structural interactions among linguistic tiers associated with the extraction of semantic information.
Robert C. Moore, SRI International (U.S.A.)
To achieve widespread acceptance, speech understanding technology needs to be domain independent. Deep understanding, however, appears to require knowledge that is domain specific. Speech understanding technology, therefore, must be partitioned into domain-independent and domain-specific components. Development of domain-independent components could be promoted by creation of semantically annotated corpora. Any such corpus, however, would be difficult to produce and would necessarily be controversial because of lack of widespread agreement on principles of semantic analysis. The use of such a corpus for performance evaluation should therefore be left largely up to the research community rather than being imposed by funding agencies.
R. K. Moore, DERA Speech Research Unit (U.K.)
Despite the significant theoretical and practical advances that have been made in automatic speech recognition in recent years, relatively little effort has been devoted to the evaluation of speech in an interactive multi-modal application interface. This paper introduces a general methodology for assessing speech-based systems and concludes with a proposal for a test scenario which focuses on the understanding component of a spoken language system.
Harald Aust, Philips Speech Processing (Germany)
Hermann Ney, RWTH Aachen, University of Technology (Germany)
An important aspect of creating high performance natural language dialog systems is the question of how they are evaluated. While a universally accepted method for doing so for pure speech recognition exists, this is not clear for speech understanding or dialog systems. We describe the methods we typically use for our systems and argue that it is not sufficient to evaluate their constituents separately. Instead, a measure for a system in its entity is needed.
Kazuyo Tanaka, Electrotechnical Laboratory (Japan)
In this paper, we discuss several major speech recognition applications which will contribute to some human activities in a decade. At first, recent Japanese speech-related national projects directed toward future intelligent systems are briefly reviewed. The we discuss three systems as the next major speech applications: substantially robust systems, multimodal interaction systems and multilingual dialogue systems. Evaluation of the performance of these systems is separately discussed in view of both total systems and specific technologies. We suggest that the degree of the difficulty of some kinds of specific tasks can be even more precisely measured, while the total system performance evaluation will become more difficult in future complex systems. Last, we take up phrase spotting, distance calculation for phonetic symbol sequences, adaptation/learning, and software modularization/multi-agents as the key techniques in constructing the above applications.