Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System
Norbert Braunschweiler and Alexandros Papangelis
Abstract:
This paper presents a comparison of two dialogue systems: one is end-to-end trainable and the other uses a more traditional, modular architecture. End-to-end trainable dialogue systems recently attracted a lot of attention because they offer several advantages over traditional systems. One of them is the avoidance to train each system module independently, by creating a single network architecture which maps an input to the corresponding output without the need for intermediate representations. While the end-to-end system investigated here had been tested in a text-in/out scenario it remained an open question how the system would perform in a speech-in/out scenario, with noisy input from a speech recognizer and output speech generated by a speech synthesizer. To evaluate this, both dialogue systems were trained on the same corpus, including human-human dialogues in the Cambridge restaurant domain, and then compared in both scenarios by human evaluation. The results show, that in both interfaces the end-to-end system receives significantly higher ratings on all metrics than the traditional modular system, an indication that it enables users to reach their goals faster and experience both a more natural system response and a better comprehension by the dialogue system.
Cite as: Braunschweiler, N., Papangelis, A. (2018) Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System. Proc. Interspeech 2018, 576-580, DOI: 10.21437/Interspeech.2018-1679.
BiBTeX Entry:
@inproceedings{Braunschweiler2018,
author={Norbert Braunschweiler and Alexandros Papangelis},
title={Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={576--580},
doi={10.21437/Interspeech.2018-1679},
url={http://dx.doi.org/10.21437/Interspeech.2018-1679} }