Publication Details

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks

LANDINI Federico Nicolás, PROFANT Ján, DIEZ Sánchez Mireia and BURGET Lukáš. Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks. Computer Speech and Language, vol. 71, no. 101254, 2022, pp. 1-16. ISSN 0885-2308. Available from: https://www.sciencedirect.com/science/article/pii/S0885230821000619
Czech title
Shlukování sekvencí x-vektorů pomocí bayessovského skrytého Markovova modelu pro diarizaci řečníků: teorie, implementace a analýza na standardních úlohách
Type
journal article
Language
english
Authors
Landini Federico Nicolás (DCGM FIT BUT)
Profant Ján (Phonexia)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

Speaker diarization, Variational Bayes, HMM, x-vector, AMI

Abstract

The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive comparison of performance of the VBx diarization with other approaches in the literature and we show that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARD II datasets. Further, we present for the first time the derivation and update formulae for the VBx model, focusing on the efficiency and simplicity of this model as compared to the previous and more complex BHMM model working on frame-by-frame standard Cepstral features. Together with this publication, we release the recipe for training the x-vector extractors used in our experiments on both wide and narrowband data, and the VBx recipes that attain state-of-the-art performance on all three datasets. Besides, we point out the lack of a standardized evaluation protocol for AMI dataset and we propose a new protocol for both Beamformed and Mix-Headset audios based on the official AMI partitions and transcriptions.

Published
2022
Pages
1-16
Journal
Computer Speech and Language, vol. 71, no. 101254, ISSN 0885-2308
Publisher
Elsevier Science
DOI
UT WoS
000761599000019
EID Scopus
BibTeX
@ARTICLE{FITPUB12619,
   author = "Nicol\'{a}s Federico Landini and J\'{a}n Profant and Mireia S\'{a}nchez Diez and Luk\'{a}\v{s} Burget",
   title = "Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks",
   pages = "1--16",
   journal = "Computer Speech and Language",
   volume = 71,
   number = 101254,
   year = 2022,
   ISSN = "0885-2308",
   doi = "10.1016/j.csl.2021.101254",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12619"
}
Back to top