Publication Details

Language models for automatic speech recognition of Czech lectures

MIKOLOV Tomáš. LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES. In: Proc. STUDENT EEICT 2008. Brno: Faculty of Electrical Engineering and Communication BUT, 2008, pp. 1-5. ISBN 978-80-214-3617-6.
Czech title
Jazykové modely pro rozpoznávání českých přednášek
Type
conference paper
Language
english
Authors
Mikolov Tomáš, Ing. (DCGM FIT BUT)
URL
Keywords

language modeling

Abstract

The paper is on LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES.

Annotation

This paper describes improvements in Automatic Speech Recognition (ASR) of Czech lectures obtained by enhancing language models. Our baseline is a statistical trigram language model with Good-Turing smoothing, trained on half billion words from newspapers, books etc. The overall improvement from adding more training data is over 10% in accuracy absolute, while using advanced language modeling techniques - mainly neural networks - yields another 3%. Perplexity improvements and OOV reduction are discussed too.

Published
2008
Pages
1-5
Proceedings
Proc. STUDENT EEICT 2008
Conference
Student EEICT 2008, Brno, CZ
ISBN
978-80-214-3617-6
Publisher
Faculty of Electrical Engineering and Communication BUT
Place
Brno, CZ
BibTeX
@INPROCEEDINGS{FITPUB8749,
   author = "Tom\'{a}\v{s} Mikolov",
   title = "Language models for automatic speech recognition of Czech lectures",
   pages = "1--5",
   booktitle = "Proc. STUDENT EEICT 2008",
   year = 2008,
   location = "Brno, CZ",
   publisher = "Faculty of Electrical Engineering and Communication BUT",
   ISBN = "978-80-214-3617-6",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/8749"
}
Back to top