Conditional-Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling
Raffaele Tavarone and Leonardo Badino
Abstract:
The first step in Automatic Speech Recognition (ASR) is a fixed-rate segmentation of the acoustic signal into overlapping windows of fixed length. Although this procedure allows to achieve excellent recognition accuracy, it is far from being computationally efficient, in that it may produce a highly redundant signal (i.e, almost identical spectral vectors may span many observation windows) that converts into computational overload. The reduction of such overload can be very beneficial for application such as offline ASR on mobile devices. In this paper we present a principled way for saving numerical operations during ASR by using conditional-computation methods in deep bidirectional Recurrent Neural Networks (RNNs) for acoustic modelling. The methods rely on learned binary neurons that allow hidden layers to be updated only when necessary or to keep their previous value. We (i) evaluate, for the first time, conditional computation-based recurrent architectures on a speech recognition task and (ii) propose a novel model specifically designed for speech data that inherently builds a multi-scale temporal structure in the hidden layers. Results on the TIMIT dataset show that conditional mechanisms in recurrent architectures can reduce hidden layer updates up to 40% at the cost of about 20% relative phone error rate increase.
Cite as: Tavarone, R., Badino, L. (2018) Conditional-Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling. Proc. Interspeech 2018, 1274-1278, DOI: 10.21437/Interspeech.2018-2195.
BiBTeX Entry:
@inproceedings{Tavarone2018,
author={Raffaele Tavarone and Leonardo Badino},
title={Conditional-Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={1274--1278},
doi={10.21437/Interspeech.2018-2195},
url={http://dx.doi.org/10.21437/Interspeech.2018-2195} }