ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM
Pulkit Sharma, Vinayak Abrol and Anshul Thakur
Abstract:
In this paper, we propose a deep learning framework which combines the generalizability of Gaussian mixture models (GMM) and discriminative power of deep matrix factorization to learn acoustic scene embedding (ASe) for the acoustic scene classification task. The proposed approach first builds a Gaussian mixture model-universal background model (GMM- UBM) using frame-wise spectral representations. This UBM is adapted to a waveform and the likelihood for each spectral frame representation is stored as a feature matrix. This matrix is fed to a deep matrix factorization pipeline (with audio recording level max-pooling) to compute a sparse-convex discriminative representation. The proposed deep factorization model is based on archetypal analysis, a form of convex NMF, which has been shown to be well suited for audio analysis. Finally, the obtained representation is mapped to a class label using a dictionary based auto-encoder consisting of linear and symmetric encoder and decoder with an efficient learning algorithm. The encoder projects the ASe of a waveform to the label space, while the decoder ensures that the feature can be reconstructed, resulting in better generalization on the test data.
Cite as: Sharma, P., Abrol, V., Thakur, A. (2018) ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM. Proc. Interspeech 2018, 3299-3303, DOI: 10.21437/Interspeech.2018-1481.
BiBTeX Entry:
@inproceedings{Sharma2018,
author={Pulkit Sharma and Vinayak Abrol and Anshul Thakur},
title={ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3299--3303},
doi={10.21437/Interspeech.2018-1481},
url={http://dx.doi.org/10.21437/Interspeech.2018-1481} }