Thesis Details
Query-by-Example Spoken Term Detection
This thesis investigates query-by-example (QbE) spoken term detection (STD). Queries are entered in their spoken form and searched for in a pool of recorded spoken utterances, providing a list of detections with their scores and timing. We describe, analyze and compare three different approaches to QbE STD, in various language-dependent and language-independent setups with diverse audio conditions, searching for a single example and five examples per query.
For our experiments we used Czech, Hungarian, English and Levantine data and for each of the languages we trained a 3-state phone posterior estimator. This gave us 16 possible combinations of the evaluation language and the language of the posterior estimator, out of which 4 combinations were language-dependent and 12 were language-independent. All QbE systems were evaluated on the same data and the same features, using the metrics: non-pooled Figure-of-Merit and our proposed utterrance-normalized non-pooled Figure-of-Merit, which provided us with relevant data for the comparison of these QbE approaches and for gaining a better insight into their behavior.
Query-by-Example, Spoken Term Detection, Finite State Transducers, System comparison, Language dependency, Low-resource languages
@phdthesis{FITPT282, author = "Michal Fap\v{s}o", type = "Ph.D. thesis", title = "Query-by-Example Spoken Term Detection", school = "Brno University of Technology, Faculty of Information Technology", year = 2014, location = "Brno, CZ", language = "english", url = "https://www.fit.vut.cz/study/phd-thesis/282/" }