Publication Details
Hybrid word-subword decoding for spoken term detection
Fapšo Michal, Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
spoken term detection
The paper is hybrid word-subword decoding for spoken term detection
This paper deals with a hybrid word-subword recognition system for spoken term detection. The decoding is driven by a hybrid recognition network and the decoder directly produces hybrid word-subword lattices. One phone and two multigram models were tested to represent sub-word units. The systems were evaluated in terms of spoken term detection accuracy and the size of index. We concluded that the best subword model for hybrid word-subword recognition is the multigram model trained on the word recognizer vocabulary. We achieved an improvement in word recognition accuracy, and in spoken term detection accuracy when in-vocabulary and out-of-vocabulary terms are searched separately. Spoken term detection accuracy with the full (in-vocabulary and out-of-vocabulary) term set was slightly worse but the required index size was significantly reduced.
@INPROCEEDINGS{FITPUB8729, author = "Igor Sz\H{o}ke and Michal Fap\v{s}o and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}", title = "Hybrid word-subword decoding for spoken term detection", pages = 4, booktitle = "Proc. SSCS 2008: Speech search workshop at SIGIR", year = 2008, location = "Singapore, SG", publisher = "Association for Computing Machinery", ISBN = "978-90-365-2697-5", language = "english", url = "https://www.fit.vut.cz/research/publication/8729" }