Interspeech 2018

Multicomponent 2-D AM-FM Modeling of Speech Spectrograms

Jitendra Kumar Dhiman, Neeraj Sharma and Chandra Sekhar Seelamantula

Abstract:

In contrast to 1-D short-time analysis of speech, 2-D modeling of spectrograms provides a characterization of speech attributes directly in the joint time-frequency plane. Building on existing 2-D models to analyze a spectrogram patch, we propose a multicomponent 2-D AM-FM representation for spectrogram decomposition. The components of the proposed representation comprise a DC, a fundamental frequency carrier and its harmonics and a spectrotemporal envelope, all in 2-D. The number of harmonics required is patch-dependent. The estimation of the AM and FM is done using the Riesz transform and the component weights are estimated using a least-squares approach. The proposed representation provides an improvement over existing state-of-the-art approaches, for both male and female speakers. This is quantified using reconstruction SNR and perceptual evaluation of speech quality (PESQ) metric. Further, we perform an overlap-add on the DC component, pooling all the patches and obtain a time-frequency (t-f) aperiodicity map for the speech signal. We verify its effectiveness in improving speech synthesis quality by using it in an existing state-of-the-art vocoder.

Cite as: Dhiman, J.K., Sharma, N., Seelamantula, C.S. (2018) Multicomponent 2-D AM-FM Modeling of Speech Spectrograms. Proc. Interspeech 2018, 736-740, DOI: 10.21437/Interspeech.2018-1937.

BiBTeX Entry:

@inproceedings{Dhiman2018,
author={Jitendra Kumar Dhiman and Neeraj Sharma and Chandra Sekhar Seelamantula},
title={Multicomponent 2-D AM-FM Modeling of Speech Spectrograms},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={736--740},
doi={10.21437/Interspeech.2018-1937},
url={http://dx.doi.org/10.21437/Interspeech.2018-1937} }