Publication Details

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

ZHANG Lin, STAFYLAKIS Themos, LANDINI Federico Nicolás, DIEZ Sánchez Mireia, SILNOVA Anna and BURGET Lukáš. Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?. In: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop. Québec City: International Speech Communication Association, 2024, pp. 123-130. Available from: https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf
Czech title
Potřebují atraktory pro neurální end-to-end diarizaci kódovat informaci o mluvčích ?
Type
conference paper
Language
english
Authors
Zhang Lin, Ph.D. (IIAI)
Stafylakis Themos (OMILIA)
Landini Federico Nicolás (DCGM FIT BUT)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

End-to-End Neural Diarization, Speaker Characteristic Information

Abstract

In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what in- formation is essential for the model. EEND-EDA utilizes attrac- tors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to con- tain speaker characteristic information. On the other hand, giv- ing the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite ar- chitectural differences in EEND systems, the notion of attrac- tors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclu- sions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.

Published
2024
Pages
123-130
Proceedings
Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop
Conference
Odyssey 2024: The Speaker and Language Recognition Workshop, Quebec, Canada, CA
Publisher
International Speech Communication Association
Place
Québec City, CA
DOI
BibTeX
@INPROCEEDINGS{FITPUB13306,
   author = "Lin Zhang and Themos Stafylakis and Nicol\'{a}s Federico Landini and Mireia S\'{a}nchez Diez and Anna Silnova and Luk\'{a}\v{s} Burget",
   title = "Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
   pages = "123--130",
   booktitle = "Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
   year = 2024,
   location = "Qu\'{e}bec City, CA",
   publisher = "International Speech Communication Association",
   doi = "10.21437/odyssey.2024-18",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13306"
}
Back to top