Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Abstract:
In this work, we propose two improvements to attention based sequence-to-sequence models for end-to-end speech recognition systems. For the first improvement, we propose to use an input-feeding architecture which feeds not only the previous context vector but also the previous decoder hidden state information as inputs to the decoder. The second improvement is based on a better hypothesis generation scheme for sequential minimum Bayes risk (MBR) training of sequence-to-sequence models where we introduce softmax smoothing into N-best generation during MBR training. We conduct the experiments on both Switchboard-300hrs and Switchboard+Fisher-2000hrs datasets and observe significant gains from both proposed improvements. Together with other training strategies such as dropout and scheduled sampling, our best model achieved WERs of 8.3%/15.5% on the Switchboard/CallHome subsets of Eval2000 without any external language models which is highly competitive among state-of-the-art English conversational speech recognition systems.
Cite as: Weng, C., Cui, J., Wang, G., Wang, J., Yu, C., Su, D., Yu, D. (2018) Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition. Proc. Interspeech 2018, 761-765, DOI: 10.21437/Interspeech.2018-1030.
BiBTeX Entry:
@inproceedings{Weng2018,
author={Chao Weng and Jia Cui and Guangsen Wang and Jun Wang and Chengzhu Yu and Dan Su and Dong Yu},
title={Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={761--765},
doi={10.21437/Interspeech.2018-1030},
url={http://dx.doi.org/10.21437/Interspeech.2018-1030} }