HUB



Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation

Markus Kitza, Ralf Schlüter and Hermann Ney

Abstract:

Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNN) acoustic models have demonstrated superior performance over Deep feed-forward Neural Networks (DNN) models in speech recognition and many other tasks. Although, a lot of work has been reported on DNN model adaptation, very little has been done on BLSTM model adaptation. This work presents a systematic study on the adaptation of BLSTM acoustic models by means of learning affine transformations within the neural network on small amounts of unsupervised adaptation data. Through a series of experiments on two major speech recognition benchmarks (Switchboard and CHiME-4), we investigate the significance of the position of the transformation in a BLSTM Network using a separate transformation for the forward- and backward-direction. We observe that applying affine transformations result in consistent relative word error rate reductions ranging from 6% to 11% depending on the task and the degree of mismatch between training and test data.


Cite as: Kitza, M., Schlüter, R., Ney, H. (2018) Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation. Proc. Interspeech 2018, 877-881, DOI: 10.21437/Interspeech.2018-2022.


BiBTeX Entry:

@inproceedings{Kitza2018,
author={Markus Kitza and Ralf Schlüter and Hermann Ney},
title={Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={877--881},
doi={10.21437/Interspeech.2018-2022},
url={http://dx.doi.org/10.21437/Interspeech.2018-2022} }