Publication Details

QUESST2014: Evaluating Query-By-Example Speech Search in a Zero-Resource Setting with Real-Life Queries

ANGUERA Xavier, RODRIGUEZ-FUENTES Luis J., BUZO Andi, METZE Florian, SZŐKE Igor and PENAGARIKANO Mikel. QUESST 2014: Evaluating Query-By-Example Speech Search in a Zero-Resource. In: Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, Queensland: IEEE Signal Processing Society, 2015, pp. 5833-5837. ISBN 978-1-4673-6997-8.

Czech title

QUESST 2014: Vyhodnocení vyhledávání v řeči pomocí hlasových dotazů na úloze bez trénovacích dat s reálnými dotazy

Type

conference paper

Language

english

Authors

Anguera Xavier (Telefónica)
Rodriguez-Fuentes Luis J. (EHU)
Buzo Andi (UPB)
Metze Florian (CMU)
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT)
Penagarikano Mikel (EHU)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2015/anguera_icassp2015_0005833.pdf PDF

Keywords

low-resource speech recognition, query-byexample speech search, spoken term detection

Abstract

This paper describes the "Query-by-Example Speech Search Task"
(QUESST), held as part of the 2014 MediaEval benchmark campaign.
The purpose of the evaluation was to perform language independent
search on speech by using speech queries.

Annotation

In this paper, we present the task and describe the main findings of the 2014 "Query-by-Example Speech Search Task" (QUESST) evaluation. The purpose of QUESST was to perform language independent search of spoken queries on spoken documents, while targeting languages or acoustic conditions for which very few speech resources are available. This evaluation investigated for the first time the performance of query-by-example search against morphological and morpho-syntactic variability, requiring participants to match variants of a spoken query in several languages of different morphological complexity. Another novelty is the use of the normalized cross entropy cost (Cnxe) as the primary performance metric, keeping Term-Weighted Value (TWV) as a secondary metric for comparison with previous evaluations. After analyzing the most competitive submissions (by five teams), we find that, although low-level "pattern matching" approaches provide the best performance for "exact" matches, "symbolic" approaches working on higher-level representations seem to perform better in more complex settings, such as matching morphological variants. Finally, optimizing the output scores for Cnxe seems to generate systems that are more robust to differences in the operating point and that also perform well in terms of TWV, whereas the opposite might not be always true.

Published

2015

Pages

5833-5837

Proceedings

Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing

Conference

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), Brisbane, AU

ISBN

978-1-4673-6997-8

Publisher

IEEE Signal Processing Society

Place

South Brisbane, Queensland, AU

DOI

10.1109/ICASSP.2015.7179090

UT WoS

000427402905191

EID Scopus

2-s2.0-84946044237

BibTeX

@INPROCEEDINGS{FITPUB10957,
   author = "Xavier Anguera and J. Luis Rodriguez-Fuentes and Andi Buzo and Florian Metze and Igor Sz\H{o}ke and Mikel Penagarikano",
   title = "QUESST2014: Evaluating Query-By-Example Speech Search in a Zero-Resource Setting with Real-Life Queries",
   pages = "5833--5837",
   booktitle = "Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing",
   year = 2015,
   location = "South Brisbane, Queensland, AU",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4673-6997-8",
   doi = "10.1109/ICASSP.2015.7179090",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10957"
}