Authors IndexSessionsTechnical programAttendees

 

Session: Large Vocabulary (Language Modeling and Speech Understanding)

Title: Maximum-Likelihood Training of the PLCG-based Language Model

Authors: Dong Hoon Van Uytsel, Dirk Van Compernolle, Patrick Wambacq

Abstract: In [1] a parsing language model based on a probabilistic left-corner grammar (PLCG) was proposed and encouraging performance on a speech recognition task using the PLCG-based language model was reported. In this paper we show how the PLCG-based language model can be further optimized by iterative parameter reestimation on unannotated training data. The precalculation of forward, inner and outer probabilities of states in the PLCG network provides an elegant crosscut to the computation of transition frequency expectations, which are needed in each iteration of the proposed reestimation procedure. The training algorithm enables model training on very large corpora. In our experiments, test set perplexity is close to saturation after three iterations, 5 to 16% lower than initially. We however observed no significant improvement of recognition accuracy after reestimation.

a01dv054.ps a01dv054.pdf