Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
Thomas Glarner, Patrick Hanebrink, Janek Ebbers and Reinhold Haeb-Umbach
Abstract:
The invention of the Variational Autoencoder enables the application of Neural Networks to a wide range of tasks in unsupervised learning, including the field of Acoustic Unit Discovery (AUD). The recently proposed Hidden Markov Model Variational Autoencoder (HMMVAE) allows a joint training of a neural network based feature extractor and a structured prior for the latent space given by a Hidden Markov Model. It has been shown that the HMMVAE significantly outperforms pure GMM-HMM based systems on the AUD task. However, the HMMVAE cannot autonomously infer the number of acoustic units and thus relies on the GMM-HMM system for initialization. This paper introduces the Bayesian Hidden Markov Model Variational Autoencoder (BHMMVAE) which solves these issues by embedding the HMMVAE in a Bayesian framework with a Dirichlet Process Prior for the distribution of the acoustic units and diagonal or full-covariance Gaussians as emission distributions. Experiments on Timit and Xitsonga show that the BHMMVAE is able to autonomously infer a reasonable number of acoustic units, can be initialized without supervision by a GMM-HMM system, achieves computationally efficient stochastic variational inference by using natural gradient descent and, additionally, improves the AUD performance over the HMMVAE.
Cite as: Glarner, T., Hanebrink, P., Ebbers, J., Haeb-Umbach, R. (2018) Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. Proc. Interspeech 2018, 2688-2692, DOI: 10.21437/Interspeech.2018-2148.
BiBTeX Entry:
@inproceedings{Glarner2018,
author={Thomas Glarner and Patrick Hanebrink and Janek Ebbers and Reinhold Haeb-Umbach},
title={Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={2688--2692},
doi={10.21437/Interspeech.2018-2148},
url={http://dx.doi.org/10.21437/Interspeech.2018-2148} }