HUB



On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification

Rajath Kumar, Vaishnavi Yeruva and Sriram Ganapathy

Abstract:

The task of personalized keyword detection system which also performs text dependent speaker verification (TDSV) has received substantial interest recently. Conventional approaches to this task involve the development of the TDSV and wake-up-word detection systems separately. In this paper, we show that TDSV and keyword spotting (KWS) can be jointly modeled using the convolutional long short term memory (CLSTM) model architecture, where an initial convolutional feature map is further processed by a LSTM recurrent network. Given a small amount of training data for developing the CLSTM system, we show that the model provides accurate detection of the presence of the keyword in spoken utterance. For the TDSV task, the MTL model can be well regularized using the CLSTM training examples for personalized wake up task. The experiments are performed for KWS wake up detection and TDSV using the combined speech recordings from Wall Street Journal (WSJ) and LibriSpeech corpus. In these experiments with multiple keywords, we illustrate that the proposed approach of MTL significantly improves the performance of previously proposed neural network based text dependent SV systems. We also experimentally illustrate that the CLSTM model provides significant improvements over previously proposed keyword detection systems as well (average relative improvements of 30% over previous approaches).


Cite as: Kumar, R., Yeruva, V., Ganapathy, S. (2018) On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification. Proc. Interspeech 2018, 1121-1125, DOI: 10.21437/Interspeech.2018-1759.


BiBTeX Entry:

@inproceedings{Kumar2018,
author={Rajath Kumar and Vaishnavi Yeruva and Sriram Ganapathy},
title={On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification},
year=2018,
booktitle={Proc. Interspeech 2018},
pages={1121--1125},
doi={10.21437/Interspeech.2018-1759},
url={http://dx.doi.org/10.21437/Interspeech.2018-1759} }