Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions
Bekir Berker Türker, Engin Erzin, Yücel Yemez and Metin Sezgin
Abstract:
Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turn-taking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are trained on human-human conversational data. Unimodal and multimodal classification performances of head-nod and turn-taking events are reported over the IEMOCAP dataset.
Cite as: Türker, B.B., Erzin, E., Yemez, Y., Sezgin, M. (2018) Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions. Proc. Interspeech 2018, 1741-1745, DOI: 10.21437/Interspeech.2018-2215.
BiBTeX Entry:
@inproceedings{Türker2018,
author={Bekir Berker Türker and Engin Erzin and Yücel Yemez and Metin Sezgin},
title={Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={1741--1745},
doi={10.21437/Interspeech.2018-2215},
url={http://dx.doi.org/10.21437/Interspeech.2018-2215} }