Publication Details
Variational Approximation of Long-span Language Models for LVCSR
Mikolov Tomáš, Ing. (DCGM FIT BUT)
Kombrink Stefan, Dipl.-Inf -Ling (DCGM FIT BUT)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Khudanpur Sanjeev (JHU)
Recurrent Neural Network, Language Model, Variational Inference
We have presented experimental evidence that (n-gram) variational approximations of long-span LMs yield greater accuracy in LVCSR than standard n-gram models estimated from the same training text.
Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentencehypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper that computationally tractable variational approximations of the long-span models are a better choice than standard n-gram models for first pass decoding. They not only result in a better first pass output, but also produce a lattice with a lower oracle word error rate, and rescoring the N-best list from such lattices with the long-span models requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone Recognition data sets are presented to support these claims.
@INPROCEEDINGS{FITPUB9659, author = "Anoop Deoras and Tom\'{a}\v{s} Mikolov and Stefan Kombrink and Martin Karafi\'{a}t and Sanjeev Khudanpur", title = "Variational Approximation of Long-span Language Models for LVCSR", pages = "5532--5535", booktitle = "Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011", year = 2011, location = "Praha, CZ", publisher = "IEEE Signal Processing Society", ISBN = "978-1-4577-0537-3", language = "english", url = "https://www.fit.vut.cz/research/publication/9659" }