Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer
Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yichang Li
Abstract:
This paper presents a multi-frame quantization of line spectral frequency (LSF) parameters using a deep autoencoder (DAE) and pyramid vector quantizer (PVQ). The object is to provide sophisticated LSF quantization for the ultra-low bit rate speech coders with moderate delay. For the compression and de-correlation of multiple LSF frames, a DAE possessing linear coder-layer units with Gaussian noise is used. The DAE demonstrates a high degree of modelling flexibility for multiple LSF frames. To quantize the coder-layer vector effectively, a PVQ is considered. Comparing the discrete cosine model (DCM), the DAE-based compression shows better modelling accuracy of multi-frame LSF parameters and possesses an advantage in that the coder-layer dimensions could be any value. The compressed coder-layer dimensions of the DAE govern the trade-off between the modelling distortion and the coder-layer quantization distortion. The experimental results show that the proposed algorithm with determined optimal coder-layer dimension outperforms the DCM-based multi-frame LSF quantization approach in terms of spectral distortion (SD) performance and robustness across different speech segments.
Cite as: Li, Y., Emiru, E.D., Xiong, S., Zhu, A., Duan, P., Li, Y. (2018) Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer. Proc. Interspeech 2018, 3553-3557, DOI: 10.21437/Interspeech.2018-2577.
BiBTeX Entry:
@inproceedings{Li2018,
author={Yaxing Li and Eshete Derb Emiru and Shengwu Xiong and Anna Zhu and Pengfei Duan and Yichang Li},
title={Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3553--3557},
doi={10.21437/Interspeech.2018-2577},
url={http://dx.doi.org/10.21437/Interspeech.2018-2577} }