Publication Details
Multi-Channel Extension of Pre-trained Models for Speaker Verification
Serizel Romain (LORIA)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Vincent Emmanuel (LORIA)
Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
multi-channel speaker verification, pre-trained models
In this work, we focus on designing a multi-channel speech processing system based on large pre-trained models. These models are typically trained for single-channel scenarios via self-supervised learning (SSL). A common approach to using the SSL models with microphone array data is to prepend it with a multi-channel speech enhancement. The downside is that spatial information can be leveraged only by the pre-processing stage, and enhancement errors get propagated to the SSL model. We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per- channel processing with cross-channel information exchange, eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.
@INPROCEEDINGS{FITPUB13322, author = "Ladislav Mo\v{s}ner and Romain Serizel and Luk\'{a}\v{s} Burget and Old\v{r}ich Plchot and Emmanuel Vincent and Junyi Peng and Jan \v{C}ernock\'{y}", title = "Multi-Channel Extension of Pre-trained Models for Speaker Verification", pages = "2135--2139", booktitle = "Proceedings of Interspeech 2024", journal = "Proceedings of Interspeech - on-line", volume = 2024, number = 9, year = 2024, location = "Kos, GR", publisher = "International Speech Communication Association", ISSN = "1990-9772", doi = "10.21437/Interspeech.2024-1260", language = "english", url = "https://www.fit.vut.cz/research/publication/13322" }