Publication Details

ABC SYSTEM DESCRIPTION FOR NIST SRE 2024

ALAM Jahangir, BARAHONA Quirós Sara, BOBOŠ Dominik, BURGET Lukáš, CUMANI Sandro, DAHMANE Mohamed, HAN Jiangyu, HLAVÁČEK Miroslav, KODOVSKÝ Martin, LANDINI Federico Nicolás, MOŠNER Ladislav, PÁLKA Petr, PAVLÍČEK Tomáš, PENG Junyi, PLCHOT Oldřich, RAJASEKHAR Gnana Praveen, ROHDIN Johan A., SILNOVA Anna, STAFYLAKIS Themos and ZHANG Lin. ABC SYSTEM DESCRIPTION FOR NIST SRE 2024. In: Proceedings of NIST SRE 2024. San Juan: National Institute of Standards and Technology, 2024, pp. 1-9.
Czech title
Popis ABC systému pro NIST SRE 2024 evaluace
Type
conference paper
Language
english
Authors
Alam Jahangir (CRIM)
Barahona Quirós Sara (UAM)
Boboš Dominik (Phonexia)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Cumani Sandro (POLITO)
Dahmane Mohamed (CRIM)
Han Jiangyu, M.Eng. (DCGM FIT BUT)
Hlaváček Miroslav (Phonexia)
Kodovský Martin (Phonexia)
Landini Federico Nicolás (DCGM FIT BUT)
Mošner Ladislav, Ing. (DCGM FIT BUT)
Pálka Petr, Bc. (DCGM FIT BUT)
Pavlíček Tomáš, Ing. (Phonexia)
Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Rajasekhar Gnana Praveen (CRIM)
Rohdin Johan A., Dr. (DCGM FIT BUT)
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Stafylakis Themos (OMILIA)
Zhang Lin, Ph.D. (FIT BUT)
URL
Keywords

NIST, speaker, recognition, evaluation

Abstract

This paper presents the ABC team's submission to the NIST SRE 2024 evaluation, a collaboration among BUT, Polito, Phonexia, Omilia, UAM, and CRIM. Our team participated in all evaluation tracks (audio-only, visual-only, and audio-visual) under both fixed and open conditions. We developed a variety of frontends, back- ends, and strategies for calibration and fusion to optimize system performance. The fixed and open conditions share some solutions. In the audio-only systems, we employed ResNet variants and the newly introduced ReDimNet model as frontends for embedding extraction. Then, we explored various backends including cosine scoring, Prob- abilistic Linear Discriminant Analysis, and Pairwise Support Vec- tor Machine. For the visual-only systems, we adopted the Insight- face framework, utilized ResNet100 and MagFace pre-trained on the MS1MV2 dataset. Cosine scoring under various strategies were ap- plied, with logistic regression used for both calibration and fusion. Finally, scores from audio-only and visual-only systems were fused using logistic regression for submission to the audio-visual track. Building on the fixed condition, the open condition included en- hancements such as larger ResNet models, additional training data from the VoxBlink2 dataset, and the pre-trained XLS-R foundation model

Published
2024
Pages
1-9
Proceedings
Proceedings of NIST SRE 2024
Conference
2024 NIST Speaker Recognition Evaluation (SRE) Workshop, Hyatt Place San Juan 580 Ave. Manuel Fernandez Juncos San Juan, PR 00907, PR
Publisher
National Institute of Standards and Technology
Place
San Juan, PR
BibTeX
@INPROCEEDINGS{FITPUB13341,
   author = "Jahangir Alam and Sara Quir\'{o}s Barahona and Dominik Bobo\v{s} and Luk\'{a}\v{s} Burget and Sandro Cumani and Mohamed Dahmane and Jiangyu Han and Miroslav Hlav\'{a}\v{c}ek and Martin Kodovsk\'{y} and Nicol\'{a}s Federico Landini and Ladislav Mo\v{s}ner and Petr P\'{a}lka and Tom\'{a}\v{s} Pavl\'{i}\v{c}ek and Junyi Peng and Old\v{r}ich Plchot and Praveen Gnana Rajasekhar and A. Johan Rohdin and Anna Silnova and Themos Stafylakis and Lin Zhang",
   title = "ABC SYSTEM DESCRIPTION FOR NIST SRE 2024",
   pages = "1--9",
   booktitle = "Proceedings of NIST SRE 2024",
   year = 2024,
   location = "San Juan, PR",
   publisher = "National Institute of Standards and Technology",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13341"
}
Back to top