HUB



DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018

Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu Kamble and Hemant Patil

Abstract:

This paper presents an Automatic Speech Recognition (ASR) system, in the Gujarati language, developed for Low Resource Speech Recognition Challenge for Indian Languages in INTERSPEECH 2018. For front-end, Amplitude Modulation (AM) features are extracted using the standard and data-driven auditory filterbanks. Recurrent Neural Network Language Models (RNNLM) are used for this task. There is a relative improvement of 36.18% and 40.95% in perplexity on the test and blind test sets, respectively, compared to 3-gram LM. TimeDelay Neural Network (TDNN) and TDNN-Long Short-Term Memory (LSTM) models are employed for acoustic modeling. The statistical significance of proposed approaches is justified using a bootstrap-based % Probability of Improvement (POI) measure. RNNLM rescoring with 3-gram LM gave an absolute reduction of 0.69-1.29% in Word Error Rate (WER) for various feature sets. AM features extracted using the gammatone filterbank (AM-GTFB) performed well on the blind test set compared to the FBANK baseline (POI>70%). The combination of ASR systems further increased the performance with an absolute reduction of 1.89 and 2.24% in WER for test and blind test sets, respectively (100% POI).


Cite as: Sailor, H.B., Venkata Siva Krishna, M., Chhabra, D., Patil, A.T., Kamble, M., Patil, H. (2018) DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018. Proc. Interspeech 2018, 3187-3191, DOI: 10.21437/Interspeech.2018-1553.


BiBTeX Entry:

@inproceedings{Sailor2018,
author={Hardik B. Sailor and Maddala {Venkata Siva Krishna} and Diksha Chhabra and Ankur T. Patil and Madhu Kamble and Hemant Patil},
title={DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={3187--3191},
doi={10.21437/Interspeech.2018-1553},
url={http://dx.doi.org/10.21437/Interspeech.2018-1553} }