Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models
Naoyuki Kanda, Yusuke Fujita and Kenji Nagamatsu
Abstract:
Lattice-free maximum mutual information (LF-MMI) training, which enables MMI-based acoustic model training without any lattice generation procedure, has recently been proposed. Although LF-MMI showed high accuracy in many tasks, its MMI criterion does not necessarily maximize the speech recognition accuracy. In this work, we propose a lattice-free state-level minimum Bayes risk training (LF-sMBR), which maximizes state-level expected accuracy without relying on a lattice generation procedure. As is the case with the LF-MMI, LF-sMBR avoids redundant lattice generation by exploiting forward-backward calculation on phone N-gram space, which enables a much simpler and faster training based on an sMBR criterion. We found that special care for silence phones was essential for improving the accuracy by LF-sMBR. In our experiments on the AMI, CSJ and Librispeech corpora, LF-sMBR achieved small but consistent improvements over LF-MMI AMs, showing state-of-the-art results for each test set.
Cite as: Kanda, N., Fujita, Y., Nagamatsu, K. (2018) Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models. Proc. Interspeech 2018, 2923-2927, DOI: 10.21437/Interspeech.2018-79.
BiBTeX Entry:
@inproceedings{Kanda2018,
author={Naoyuki Kanda and Yusuke Fujita and Kenji Nagamatsu},
title={Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={2923--2927},
doi={10.21437/Interspeech.2018-79},
url={http://dx.doi.org/10.21437/Interspeech.2018-79} }