Project Details
Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods
Project Period: 1. 3. 2017 - 28. 2. 2019
Project Type: grant
Code: 748097
Agency: European Comission EU
Program: Horizon 2020
Machine learning, statistical data processing and applications using signal processing, Numerical analysis, simulation, optimisation, modelling tools, data mining, Ontologies, neural networks, genetic programming, fuzzy logic, Cognitive science, human computer interaction, natural language processing, Complexity and cryptography, electronic security, privacy, biometrics, Speaker Diarization, Speaker Recognition, Variational Bayes Inference, Deep Neural Networks, Speech Data Mining
The proposed project deals with Speaker Diarization (SD) which is commonly defined as the task of answering the question "who spoke when?" in a speech recording. The first objective of the proposal is to optimize the Bayesian approach to SD, which has shown to be promising for the tasks. For Variational Bayes (VB) inference, that is very sensitive to initialization, we will develop new fast ways of obtaining a good starting point. We will also explore alternative inference methods, such as collapsed VB or collapsed Gibbs Sampling, and investigate into alternative priors similar to those introduced for Bayesian speaker recognition models. The second part of the proposal is motivated by the huge performance gains that, in recent years, have been brought to other recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have been used in the computation of i-vectors, but their potential was never explored for other stages of SD. We will study ways of integrating DNNs in the different stages of SD systems. The objectives of the proposal will be achieved by theoretical work, implementation, and careful testing on real speech data. The outcomes of the project are intended not only for scientific publications, but eagerly awaited by European speech data mining industry (for example Czech Phonexia or Spanish Agnitio). The project is proposed by an excellent female researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of University of the Basque Country, one of the most important European labs dealing with speaker recognition and diarization. The proposed host is the Speech@FIT group of Brno University of Technology, with a 20-year track of top speech data mining research. The proposed research training and combination of skills of Dr. Diez and the host institution have chances to advance the state-of-the-art in speaker diarization, provide the applicant with improved career opportunities and benefit European industry.
This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 748097.
2020
- MATĚJKA Pavel, PLCHOT Oldřich, GLEMBEK Ondřej, BURGET Lukáš, ROHDIN Johan A., ZEINALI Hossein, MOŠNER Ladislav, SILNOVA Anna, NOVOTNÝ Ondřej, DIEZ Sánchez Mireia and ČERNOCKÝ Jan. 13 years of speaker recognition research at BUT, with longitudinal analysis of NIST SRE. Computer Speech and Language, vol. 2020, no. 63, pp. 1-15. ISSN 0885-2308. Detail
- DIEZ Sánchez Mireia, BURGET Lukáš, LANDINI Federico Nicolás and ČERNOCKÝ Jan. Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 28, no. 1, 2020, pp. 355-368. ISSN 2329-9290. Detail
2019
- MATĚJKA Pavel, PLCHOT Oldřich, ZEINALI Hossein, MOŠNER Ladislav, SILNOVA Anna, BURGET Lukáš, NOVOTNÝ Ondřej and GLEMBEK Ondřej. Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 2448-2452. ISSN 1990-9772. Detail
- DIEZ Sánchez Mireia, BURGET Lukáš, WANG Shuai, ROHDIN Johan A. and ČERNOCKÝ Jan. Bayesian HMM based x-vector clustering for Speaker Diarization. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 346-350. ISSN 1990-9772. Detail
2018
- PLCHOT Oldřich, MATĚJKA Pavel, NOVOTNÝ Ondřej, CUMANI Sandro, LOZANO Díez Alicia, SLAVÍČEK Josef, DIEZ Sánchez Mireia, GRÉZL František, GLEMBEK Ondřej, KAMSALI Veera Mounika, SILNOVA Anna, BURGET Lukáš, ONDEL Yang Lucas Antoine Francois, KESIRAJU Santosh and ROHDIN Johan A. Analysis of BUT-PT Submission for NIST LRE 2017. In: Proceedings of Odyssey 2018 The Speaker and Language Recognition Workshop. Les Sables d'Olonne: International Speech Communication Association, 2018, pp. 47-53. ISSN 2312-2846. Detail
- DIEZ Sánchez Mireia, LANDINI Federico Nicolás, BURGET Lukáš, ROHDIN Johan A., SILNOVA Anna, ŽMOLÍKOVÁ Kateřina, NOVOTNÝ Ondřej, VESELÝ Karel, GLEMBEK Ondřej, PLCHOT Oldřich, MOŠNER Ladislav and MATĚJKA Pavel. BUT system for DIHARD Speech Diarization Challenge 2018. In: Proceedings of Interspeech 2018. Hyderabad: International Speech Communication Association, 2018, pp. 2798-2802. ISSN 1990-9772. Detail
- ROHDIN Johan A., SILNOVA Anna, DIEZ Sánchez Mireia, PLCHOT Oldřich, MATĚJKA Pavel and BURGET Lukáš. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In: Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018, pp. 4874-4878. ISBN 978-1-5386-4658-8. Detail
- DIEZ Sánchez Mireia, BURGET Lukáš and MATĚJKA Pavel. Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. In: Proceedings of Odyssey 2018. Les Sables d´Olonne: International Speech Communication Association, 2018, pp. 147-154. ISSN 2312-2846. Detail
2017
- PLCHOT Oldřich, MATĚJKA Pavel, SILNOVA Anna, NOVOTNÝ Ondřej, DIEZ Sánchez Mireia, ROHDIN Johan A., GLEMBEK Ondřej, BRÜMMER Niko, SWART Albert du Preez, PRIETO Jesús J., GARCIA Perera Leibny Paola, BUERA Luis, KENNY Patrick, ALAM Jahangir and BHATTACHARYA Gautam. Analysis and Description of ABC Submission to NIST SRE 2016. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 1348-1352. ISSN 1990-9772. Detail
- MATĚJKA Pavel, NOVOTNÝ Ondřej, PLCHOT Oldřich, BURGET Lukáš, DIEZ Sánchez Mireia and ČERNOCKÝ Jan. Analysis of Score Normalization in Multilingual Speaker Recognition. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 1567-1571. ISSN 1990-9772. Detail
- MATĚJKA Pavel, PLCHOT Oldřich, NOVOTNÝ Ondřej, CUMANI Sandro, LOZANO Díez Alicia, SLAVÍČEK Josef, DIEZ Sánchez Mireia, GRÉZL František, GLEMBEK Ondřej, KAMSALI Veera Mounika, SILNOVA Anna, BURGET Lukáš, ONDEL Yang Lucas Antoine Francois, KESIRAJU Santosh and ROHDIN Johan A. BUT- PT System Description for NIST LRE 2017. In: Proceedings of NIST Language Recognition Workshop 2017. Orlando, Florida: National Institute of Standards and Technology, 2017, pp. 1-6. Detail
- VESELÝ Karel, BASKAR Murali K., DIEZ Sánchez Mireia and BENEŠ Karel. MGB-3 BUT System: Low-resource ASR on Egyptian YOUTUBE data. In: Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017, pp. 368-373. ISBN 978-1-5090-4788-8. Detail
2020
- Bayesian HMM based x-vector clustering - VBx, software, 2020
Authors: Diez Sánchez Mireia, Landini Federico Nicolás, Burget Lukáš Detail