Publication Details
Sub-word modeling of out of vocabulary words in spoken term detection
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
Fapšo Michal, Ing. (DCGM FIT BUT)
phone, multigram, spoken term detection, subword, keyword spotting, syllable, lattice
The work is on sub-word modeling of out of vocabulary words in spoken term detection
This paper deals with comparison of sub-word based methods for
spoken term detection (STD) task and phone recognition. The subword units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.
@INPROCEEDINGS{FITPUB8840, author = "Igor Sz\H{o}ke and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y} and Michal Fap\v{s}o", title = "Sub-word modeling of out of vocabulary words in spoken term detection", pages = 4, booktitle = "Proc. 2008 IEEE Workshop on Spoken Language Technology", year = 2008, location = "Goa, IN", publisher = "IEEE Signal Processing Society", ISBN = "978-1-4244-3472-5", language = "english", url = "https://www.fit.vut.cz/research/publication/8840" }