Bayesian HMM based x-vector clustering

Type

software

Language

English

Authors

Diez Sánchez Mireia, M.Sc., Ph.D., DCGM (FIT)
Landini Federico Nicolás, Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)

Description

Diarization is the task of determining the number of speakers and "who speaks when" in a recording. It is part of speech data mining. The proposed software contains a full implementation of a Bayesian approach to do speaker diarization using low-dimensional neural representation of speakers (x-vectors) in individual segments. It follows the Brno University of Technology recipe for the Second DIHARD Diarization Challenge Track 1, where BUT was the winner.It consists of computing filter-bank features, computing x-vectors, performing Agglomerative Hierarchical Clustering on x-vectors as a first step to produce an initialization, applying Variational Bayes HMM over x-vectors to produce the diarization output, and scoring the diarization output. The software is written in Python and released as open-source under Apache License.

Keywords

Speaker Diarization, Variational Bayes, HMM, x-vector, DIHARD

URL

https://github.com/BUTSpeechFIT/VBx

License

Use of the result by another entity is possible without acquiring a license (the result is not licensed)

License Fee

The licensor does not require a license fee for the result

Files

Projects

IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Moderní metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-20-6460, start: 2020-03-01, end: 2023-02-28, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods, EU, Horizon 2020, start: 2017-03-01, end: 2019-02-28, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)

Study Department

Bayesian HMM based x-vector clustering - VBx