Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features
Linxue Bai, Philip Weber, Peter Jančovič and Martin Russell
Abstract:
Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create "cardinal" features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.
Cite as: Bai, L., Weber, P., Jančovič, P., Russell, M. (2018) Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features. Proc. Interspeech 2018, 1472-1476, DOI: 10.21437/Interspeech.2018-2462.
BiBTeX Entry:
@inproceedings{Bai2018,
author={Linxue Bai and Philip Weber and Peter Jančovič and Martin Russell},
title={Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={1472--1476},
doi={10.21437/Interspeech.2018-2462},
url={http://dx.doi.org/10.21437/Interspeech.2018-2462} }