Word Emphasis Prediction for Expressive Text to Speech
Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki
Abstract:
Word emphasis prediction is an important part of expressive prosody generation in modern Text-To-Speech (TTS) systems. We present a method for predicting emphasized words for expressive TTS, based on a Deep Neural Network (DNN). We show that the presented method outperforms machine learning methods based on hand-crafted features in terms of objective metrics such as precision and recall. Using a listening test, we further demonstrate that the contribution of the predicted emphasized words to the expressiveness of the synthesized speech is subjectively perceivable.
Cite as: Mass, Y., Shechtman, S., Mordechay, M., Hoory, R., Sar Shalom, O., Lev, G., Konopnicki, D. (2018) Word Emphasis Prediction for Expressive Text to Speech. Proc. Interspeech 2018, 2868-2872, DOI: 10.21437/Interspeech.2018-1159.
BiBTeX Entry:
@inproceedings{Mass2018,
author={Yosi Mass and Slava Shechtman and Moran Mordechay and Ron Hoory and Oren {Sar Shalom} and Guy Lev and David Konopnicki},
title={Word Emphasis Prediction for Expressive Text to Speech},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={2868--2872},
doi={10.21437/Interspeech.2018-1159},
url={http://dx.doi.org/10.21437/Interspeech.2018-1159} }