Authors IndexSessionsTechnical programAttendees

 

Morena Danieli - Loquendo (Italy)

Title: Evaluating Dialogue Strategies and User Behavior

Abstract: As the use of spoken and multimodal dialogue systems becomes more widespread, the need for accurate and flexible evaluation frameworks becomes crucial. During the past few years, this subject has been the meeting point of different approaches, including human factor studies, methodologies based on speech recognition metrics, and approaches that combines task success measures and user appreciation into a single performance function. That work had relevant impact on establishing new trends in best practice for the design of spoken language systems. Current evaluation methodologies of dialogue systems share some concepts. Notions such as task success, task efficiency, and low costs in term of cognitive load play a central role in most of them. These are useful concepts for correlating predictors of user satisfaction with the performance reports of dialogue systems. However the need for predicting user satisfaction often arises in early design phases of spoken dialogue systems when sufficient and reliable experimental data are not usually available. The general idea of the present paper is that in the early design phases of spoken dialogue systems it is worthwhile evaluating user s easiness in interacting with different dialogue strategies, rather than efficiency of the dialogue system in providing the user with the required information. The methodology I propose is based on the assumption that the success of a task-oriented dialogue system greatly depends on the ability of providing a meaningful match between user s expectations about the task and system capabilities in solving the task. This assumption implies that spoken dialogue strategies that provide a good trade-off between expectations and system capabilities improve user s effectiveness in using the system. The evaluation methodology requires three steps. The first step has the goal of individuating the different tokens and relations that constitute the user mental model of the task. It is based on the study of corpus data collected from real situations and with the same task of the perspective dialogue system. The nature of the corpus can greatly vary depending on the task: for example, for most common applications of spoken dialogue systems can be made of transcripts of human-human dialogues on the same task. For other tasks, such as web navigation, the corpus can include collections from different input sources (visual, records of transitions from one content page to another one, and so on). The different elements of the mental model of the user for that task will be taken into account in the design of the dialogue strategy and system prompts. Once tokens and relations are considered for designing one or more dialogue strategies the evaluation enters its second step, which is constituted by a between-group experiment. Each strategy will be tried by a representative set of experimental subjects, possibly in a real context of use or in a Wizard-of-Oz experimental setting. Here well-established techniques of usability evaluation can be exploited and direct measures of user behavior will be recorded, including user response time and user response contents. At the end of their session with the dialogue systems, users are interviewed in order to capture the perceived task success. The third step includes measuring user effectiveness in providing the spoken dialogue system with the information it needs to solve the task. This evaluation uses the kappa coefficient; this is calculated from a confusion matrix which reports data obtained during the experimentations. The way the confusion matrix is used here differs from the previous examples because users are evaluated rather than a dialog strategy, i.e. how effectively and easily subjects can use a given dialogue strategy for completing the task. Evaluation of surveys and other direct measures of users behavior are done at this stage. In the paper, I will argue that the application of these three-steps evaluation method may increase our understanding of the user mental model of a task during early stage of development of a spoken language agent, and I will report experimental data supporting this claim.

Curriculum: Morena Danieli is a computational linguist graduated at the University of Torino (Italy) and University of Geneva (Switzerland). She has been working on design and evaluation of spoken language systems for telecommunication applications since 1991. She worked for several European research projects on spoken dialogue, including ESPRIT-SUNDIAL and LRE-ARISE, and for international working group on dialog corpora annotation such as the Discourse Research Initiative. She is a co-founder of the Special Interest Group on dialogue (SIGdial) of the Association for Computational Linguistics. At present, she is technical leader at Loquendo S.p.A., Torino.

Danieli.ps Danieli.pdf