Interspeech 2018

An Open Source Emotional Speech Corpus for Human Robot Interaction Applications

Jesin James, Li Tian and Catherine Inez Watson

Abstract:

For further understanding the wide array of emotions embedded in human speech, we are introducing a strictly-guided simulated emotional speech corpus. In contrast to existing speech corpora, this was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions and 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI) to model natural conversations among humans and robots. But there are few existing speech resources to study these emotions, which has motivated the creation of this corpus. A large scale perception test with 120 participants showed that the corpus has approximately 70% and 40% accuracy in the correct classification of primary and secondary emotions respectively. The reasons behind the differences in perception accuracies of the two emotion types is further investigated. A preliminary prosodic analysis of corpus shows significant differences among the emotions. The corpus is made public at: github.com/tli725/JL-Corpus.

Cite as: James, J., Tian, L., Inez Watson, C. (2018) An Open Source Emotional Speech Corpus for Human Robot Interaction Applications. Proc. Interspeech 2018, 2768-2772, DOI: 10.21437/Interspeech.2018-1349.

BiBTeX Entry:

@inproceedings{James2018,
author={Jesin James and Li Tian and Catherine {Inez Watson}},
title={An Open Source Emotional Speech Corpus for Human Robot Interaction Applications},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={2768--2772},
doi={10.21437/Interspeech.2018-1349},
url={http://dx.doi.org/10.21437/Interspeech.2018-1349} }