Publication Details

Multi-Channel Extension of Pre-trained Models for Speaker Verification

MOŠNER Ladislav, SERIZEL Romain, BURGET Lukáš, PLCHOT Oldřich, VINCENT Emmanuel, PENG Junyi and ČERNOCKÝ Jan. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In: Proceedings of Interspeech 2024. Kos: International Speech Communication Association, 2024, pp. 2135-2139. ISSN 1990-9772. Available from: https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf
Czech title
Vícekanálové rozšíření předtrénovaných modelů pro ověřování mluvčího
Type
conference paper
Language
english
Authors
Mošner Ladislav, Ing. (DCGM FIT BUT)
Serizel Romain (LORIA)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Vincent Emmanuel (LORIA)
Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

multi-channel speaker verification, pre-trained models

Abstract

In this work, we focus on designing a multi-channel speech processing system based on large pre-trained models. These models are typically trained for single-channel scenarios via self-supervised learning (SSL). A common approach to using the SSL models with microphone array data is to prepend it with a multi-channel speech enhancement. The downside is that spatial information can be leveraged only by the pre-processing stage, and enhancement errors get propagated to the SSL model. We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per- channel processing with cross-channel information exchange, eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.

Published
2024
Pages
2135-2139
Journal
Proceedings of Interspeech - on-line, vol. 2024, no. 9, ISSN 1990-9772
Proceedings
Proceedings of Interspeech 2024
Conference
Interspeech Conference, Kos, GR
Publisher
International Speech Communication Association
Place
Kos, GR
DOI
BibTeX
@INPROCEEDINGS{FITPUB13322,
   author = "Ladislav Mo\v{s}ner and Romain Serizel and Luk\'{a}\v{s} Burget and Old\v{r}ich Plchot and Emmanuel Vincent and Junyi Peng and Jan \v{C}ernock\'{y}",
   title = "Multi-Channel Extension of Pre-trained Models for Speaker Verification",
   pages = "2135--2139",
   booktitle = "Proceedings of Interspeech 2024",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2024,
   number = 9,
   year = 2024,
   location = "Kos, GR",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2024-1260",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13322"
}
Back to top