Publication Details

Rich System Combination For Keyword Spotting In Noisy and Acoustically Heterogenous Audio Streams

AKBACAK Murat, BURGET Lukáš, WENG Wan and VAN Hout Julien. Rich System Combination For Keyword Spotting In Noisy and Acoustically Heterogenous Audio Streams. In: Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013, pp. 8267-8271. ISBN 978-1-4799-0355-9.

Czech title

Bohatá systémová kombinace pro detekci klíčových slov v zašuměných a akusticky heterogenních audio kanálech

Type

conference paper

Language

english

Authors

Akbacak Murat (MSR)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Weng Wan (SRI)
van Hout Julien (SRI)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2013/akbacak_icassp2013_0008267.pdf PDF

Keywords

Keyword spotting, channel degradation, acoustic noise, robust audio search.

Abstract

In this paper we address the problem of retrieving spoken information from noisy and heterogeneous audio archives using rich system combination with a diverse set of robust modules and audio characterization.

Annotation

We address the problem of retrieving spoken information from noisy and heterogeneous audio archives using system combination with a rich and diverse set of noise-robust modules. Audio search applications so far have focused on constrained domains or genres and not-so-noisy and heterogeneous acoustic or channel conditions. In this paper, our focus is to improve the accuracy of a keyword spotting system in highly degraded and diverse channel conditions by employing multiple recognition systems in parallel with different robust frontends and modeling choices, as well as different representations during audio indexing and search (words vs. subword units). After aligning keyword hits from different systems, we employ system combination at the score level using a logistic-regression-based classifier. Side information such as the output of an acoustic condition identification module is used to guide system combination system that is trained on a held-out dataset. Lattice-based indexing and search is used in all keyword spotting systems. We present improvements in probability-miss at a fixed probability-false-alarm by employing our proposed rich system combination approach on DARPA Robust Automatic Transcription of Speech (RATS) Phase- I evaluation data that contains highly degraded channel recordings (signal-to-noise ratio levels as low as 0 dB) and different channel characteristics

Published

2013

Pages

8267-8271

Proceedings

Proceedings of ICASSP 2013

Conference

38th International Conference on Acoustics, Speech, and Signal Processing, Vancouver, CA

ISBN

978-1-4799-0355-9

Publisher

IEEE Signal Processing Society

Place

Vancouver, CA

BibTeX

@INPROCEEDINGS{FITPUB10339,
   author = "Murat Akbacak and Luk\'{a}\v{s} Burget and Wan Weng and Julien Hout van",
   title = "Rich System Combination For Keyword Spotting In Noisy and Acoustically Heterogenous Audio Streams",
   pages = "8267--8271",
   booktitle = "Proceedings of ICASSP 2013",
   year = 2013,
   location = "Vancouver, CA",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4799-0355-9",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10339"
}