Interspeech 2018

Combined Speaker Clustering and Role Recognition in Conversational Speech

Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson and Shrikanth Narayanan

Abstract:

Speaker Role Recognition (SRR) is usually addressed either as an independent classification task, or as a subsequent step after a speaker clustering module. However, the first approach does not take speaker-specific variabilities into account, while the second one results in error propagation. In this work we propose the integration of an audio-based speaker clustering algorithm with a language-aided role recognizer into a meta-classifier which takes both modalities into account. That way, we can treat separately any speaker-specific and role-specific characteristics before combining the relevant information together. The method is evaluated on two corpora of different conditions with interactions between a clinician and a patient and it is shown that it yields superior results for the SRR task.

Cite as: Flemotomos, N., Papadopoulos, P., Gibson, J., Narayanan, S. (2018) Combined Speaker Clustering and Role Recognition in Conversational Speech. Proc. Interspeech 2018, 1378-1382, DOI: 10.21437/Interspeech.2018-1654.

BiBTeX Entry:

@inproceedings{Flemotomos2018,
author={Nikolaos Flemotomos and Pavlos Papadopoulos and James Gibson and Shrikanth Narayanan},
title={Combined Speaker Clustering and Role Recognition in Conversational Speech},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={1378--1382},
doi={10.21437/Interspeech.2018-1654},
url={http://dx.doi.org/10.21437/Interspeech.2018-1654} }