Publication Details
Toroidal Probabilistic Spherical Discriminant Analysis
Brummer Johan Nikolaas Langenhoven, Dr. (AmazonCom)
Swart Albert du Preez (Speechly)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
speaker recognition, PSDA, Von Mises-Fishe
n speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accu- racy on par with cosine scoring, while PLDA accuracy is infe- rior. On NIST SRE'21 we find that T-PSDA gives large accu- racy gains compared to both cosine scoring and PLDA.
@INPROCEEDINGS{FITPUB13052, author = "Anna Silnova and Langenhoven Nikolaas Johan Brummer and Preez du Albert Swart and Luk\'{a}\v{s} Burget", title = "Toroidal Probabilistic Spherical Discriminant Analysis", pages = "1--5", booktitle = "Proceedings of ICASSP 2023", year = 2023, location = "Rhodes Island, GR", publisher = "IEEE Signal Processing Society", ISBN = "978-1-7281-6327-7", doi = "10.1109/ICASSP49357.2023.10095580", language = "english", url = "https://www.fit.vut.cz/research/publication/13052" }