Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations
Aaron Nicolson and Kuldip K. Paliwal
Abstract:
An accurate Ideal Binary Mask (IBM) estimate is essential for Missing Feature Theory (MFT)-based speaker identification, as incorrectly labelled spectral components (where a component is either reliable or unreliable) will degrade the performance of an Automatic Speaker Identification (ASI) system adversely in the presence of noise. In this work a Bidirectional Recurrent Neural Network (BRNN) with Long-Short Term Memory (LSTM) cells is proposed for improved IBM estimation. The proposed system had an average IBM estimate accuracy improvement of 4.5% and an average MFT-based speaker identification accuracy improvement of 3.1% over all tested SNR dB levels, when compared to the previously proposed Multilayer Perceptron (MLP)-IBM estimator. When used for speech enhancement the proposed system had an average MOS-LQO (objective quality measure) improvement of 0.32 and an average QSTI (objective intelligibility measure) improvement of 0.01 over all tested SNR dB levels, when compared to the MLP-IBM estimator. The results presented in this work highlight the effectiveness of the proposed BRNN-IBM estimator for MFT-based speaker identification and IBM-based speech enhancement.
Cite as: Nicolson, A., Paliwal, K.K. (2018) Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations. Proc. Interspeech 2018, 1606-1610, DOI: 10.21437/Interspeech.2018-1134.
BiBTeX Entry:
@inproceedings{Nicolson2018,
author={Aaron Nicolson and Kuldip K. Paliwal},
title={Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={1606--1610},
doi={10.21437/Interspeech.2018-1134},
url={http://dx.doi.org/10.21437/Interspeech.2018-1134} }