Publication Details

Normalising Flows for Speaker and Language Recognition Backend

ESPUNA Fontcuberta Aleix, PRASAD Amrutha, MOTLÍČEK Petr, MADIKERI Srikanth and SCHUEPBACH Christof. Normalising Flows for Speaker and Language Recognition Backend. In: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop. Quebec: International Speech Communication Association, 2024, pp. 74-80. Available from: https://www.isca-archive.org/odyssey_2024/espuna24_odyssey.pdf

Czech title

Normalizace toků pro back-end pro rozpoznávání mluvčího a jazyka

Type

conference paper

Language

english

Authors

Espuna Fontcuberta Aleix (IDIAP)
Prasad Amrutha (DCGM FIT BUT)
Motlíček Petr, doc. Ing., Ph.D. (DCGM FIT BUT)
Madikeri Srikanth (IDIAP)
Schuepbach Christof (armasuise)

URL

Keywords

Speaker recognition, Language Recognition

Abstract

In this paper, we address the Gaussian distribution assumption made in PLDA, a popular back-end classifier used in Speaker and Language recognition tasks. We study normalizing flows, which allow using non-linear transformations and still obtain a model that can explicitly represent a probability density. The model makes no assumption about the distribution of the ob- servations. This alleviates the need for length normalization, a well known data preprocessing step used to boost PLDA performance. We demonstrate the effectiveness of this flow model on NIST SRE16, LRE17 and LRE22 datasets. We ob- serve that when applying length normalization, both the flow model and PLDA achieve similar EERs for SRE16 (11.5% vs 11.8%). However, when length normalization is not applied, the flow shows more robustness and offers better EERs (13.1% vs 17.1%). For LRE17 and LRE22, the best classification accu- racies (84.2%, 75.5%) are obtained by the flow model without any need for length normalization.

Published

2024

Pages

74-80

Proceedings

Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop

Conference

Odyssey 2024: The Speaker and Language Recognition Workshop, Quebec, Canada, CA

Publisher

International Speech Communication Association

Place

Quebec, CA

DOI

10.21437/odyssey.2024-11

BibTeX

@INPROCEEDINGS{FITPUB13295,
   author = "Aleix Fontcuberta Espuna and Amrutha Prasad and Petr Motl\'{i}\v{c}ek and Srikanth Madikeri and Christof Schuepbach",
   title = "Normalising Flows for Speaker and Language Recognition Backend",
   pages = "74--80",
   booktitle = "Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
   year = 2024,
   location = "Quebec, CA",
   publisher = "International Speech Communication Association",
   doi = "10.21437/odyssey.2024-11",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13295"
}