Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems
Deepak Baby and Sarah Verhulst
Abstract:
Recent advances in neural network (NN)-based speech enhancement schemes are shown to outperform most conventional techniques. However, the performance of such systems in adverse listening conditions such as negative signal-to-noise ratios and unseen noises is still far from that of humans. Motivated by the remarkable performance of humans under these challenging conditions, this paper investigates whether biophysically-inspired features can mitigate the poor generalization capabilities of NN-based speech enhancement systems. We make use of features derived from several human auditory periphery models for training a speech enhancement system that employs long short-term memory (LSTM) and evaluate them on a variety of mismatched testing conditions. The results reveal that biophysically-inspired auditory models such as nonlinear transmission line models improve the generalizability of LSTM-based noise suppression systems in terms of various objective quality measures, suggesting that such features lead to robust speech representations that are less sensitive to the noise type.
Cite as: Baby, D., Verhulst, S. (2018) Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems. Proc. Interspeech 2018, 3264-3268, DOI: 10.21437/Interspeech.2018-1237.
BiBTeX Entry:
@inproceedings{Baby2018,
author={Deepak Baby and Sarah Verhulst},
title={Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3264--3268},
doi={10.21437/Interspeech.2018-1237},
url={http://dx.doi.org/10.21437/Interspeech.2018-1237} }