HUB



A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement

Yangyang Xia and Richard Stern

Abstract:

Speech enhancement under highly non-stationary noise conditions remains a challenging problem. Classical methods typically attempt to identify a frequency-domain optimal gain function that suppresses noise in noisy speech. These algorithms typically produce artifacts such as “musical noise” that are detrimental to machine and human understanding, largely due to inaccurate estimation of noise power spectra. The optimal gain function is commonly referred to as the ideal ratio mask (IRM) in neural-network-based systems and the goal becomes estimation of the IRM from the short-time Fourier transform amplitude of degraded speech. While these data-driven techniques are able to enhance speech quality with reduced artifacts, they are frequently not robust to types of noise that they had not been exposed to in the training process. In this paper, we propose a novel recurrent neural network (RNN) that bridges the gap between classical and neural-network-based methods. By reformulating the classical decision-directed approach, the a priori and a posteriori SNRs become latent variables in the RNN, from which the frequency-dependent estimated likelihood of speech presence is used to update recursively the latent variables. The proposed method provides substantial enhancement of speech quality and objective accuracy in machine interpretation of speech.


Cite as: Xia, Y., Stern, R. (2018) A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement. Proc. Interspeech 2018, 3274-3278, DOI: 10.21437/Interspeech.2018-2423.


BiBTeX Entry:

@inproceedings{Xia2018,
author={Yangyang Xia and Richard Stern},
title={A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3274--3278},
doi={10.21437/Interspeech.2018-2423},
url={http://dx.doi.org/10.21437/Interspeech.2018-2423} }