Interspeech 2018

Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory

Abstract:

Estimating articulatory movements from speech acoustic features is known as acoustic-to-articulatory inversion (AAI). Large amount of parallel data from speech and articulatory motion is required for training an AAI model in a subject dependent manner, referred to as subject dependent AAI (SD-AAI). Electromagnetic articulograph (EMA) is a promising technology to record such parallel data, but it is expensive, time consuming and tiring for a subject. In order to reduce the demand for parallel acoustic-articulatory data in the AAI task for a subject, we, in this work, propose a subject-adaptative AAI method (SA-AAI) from an existing AAI model which is trained using large amount of parallel data from a fixed set of subjects. Experiments are performed with 30 subjects’ acoustic-articulatory data and AAI is trained using BLSTM network to examine the amount of data needed from a new target subject for the SA-AAI to achieve an AAI performance equivalent to that of SD-AAI. Experimental results reveal that the proposed SA-AAI performs similar to that of the SD-AAI with ∼62.5% less training data. Among different articulators, the SA-AAI performance for tongue articulators matches with the corresponding SD-AAI performance with only ∼12.5% of the data used for SD-AAI training.

Cite as: Illa, A., Ghosh, P.K. (2018) Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory. Proc. Interspeech 2018, 3122-3126, DOI: 10.21437/Interspeech.2018-1843.

BiBTeX Entry:

@inproceedings{Illa2018,
author={Aravind Illa and Prasanta Kumar Ghosh},
title={Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3122--3126},
doi={10.21437/Interspeech.2018-1843},
url={http://dx.doi.org/10.21437/Interspeech.2018-1843} }