Robust End-To-End SPEAKER recognition based on deep learning and attention models

Czech title

Robustní rozpoznávání SPEAKER na základě modelů hlubokého učení a pozornosti

Type

grant

Keywords

machine learning, data mining, statistical data processing and applications, numerical analysis, simulation, optimisation, modelling tools, signal processing, neural networks, connectionist systems, fuzzy logic, complexity and cryptography, electronic security, privacy, biometrics, speaker recognition, Deep Neural Networks, Attention Models, Deep Learning, Language Recognition, Speech Processing

Abstract

This project focuses on automatic speaker recognition (SID), the task of determining the identity of the speaker in a speech recording. Disentangling the speaker specific information from the rest of nuisance variability (channel, etc.) requires complex models. Deep neural networks (DNNs) have recently showed their potential for this, as the popular x-vector learnt by a DNN. Here, we aim for end-to-end SID where the system is optimized as a whole for the target task. Despite some first steps in this line, several aspects are still unexplored. We propose to explore recurrent approaches, suitable to deal with temporal signals, as well as different pooling methods to obtain a fixed-length representation from a variable length input sequence of speech, an important issue in the field. We also want to explore different flavors of attention mechanisms, which make the DNN focus on relevant parts of the input, providing a way to quantify how much evidence has been collected about the speaker identity and the uncertainty of the obtained representation, critical issue when making (Bayesian) decisions in SID. Finally, some other approaches such as using the raw signal (instead of features) or other advances that might arise will be also explored for SID and related tasks. To achieve our goals, we will start from theory, implement the proposed approaches and test on real speech data. The outcomes are intended to benefit both scientific community and speech processing industry, such as Phonexia or Nuance. The applicant Dr. Alicia Lozano-Diez is an excellent female researcher, who has done her Ph.D. at Audias (Universidad Autonoma de Madrid, Spain), a respected research lab. The host group Speech@FIT from Brno University of Technology (Czechia) has a top-class track on speech processing research. Thus, we expect the combination of both the researcher and the host to both boost the researcher career and benefit the host group (and its industrial European partners).

Team members

Lozano Díez Alicia, Ph.D. (UPGM FIT VUT) , research leader
Burget Lukáš, doc. Ing., Ph.D. (UPGM FIT VUT) , team leader

Support

[img]

This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 843627.

Publications

2021

LANDINI Federico Nicolás, LOZANO Díez Alicia, BURGET Lukáš, DIEZ Sánchez Mireia, SILNOVA Anna, ŽMOLÍKOVÁ Kateřina, GLEMBEK Ondřej, MATĚJKA Pavel, STAFYLAKIS Themos and BRUMMER Johan Nikolaas Langenhoven. BUT System Description for The Third DIHARD Speech Diarization Challenge. In: Proceedings available at Dihard Challenge Github. on-line by LDC and University of Pennsylvania, 2021, pp. 1-5. Detail

2020

ALAM Jahangir, BOULIANNE Gilles, BURGET Lukáš, DAHMANE Mohamed, DIEZ Sánchez Mireia, GLEMBEK Ondřej, LALONDE Marc, LOZANO Díez Alicia, MATĚJKA Pavel, MIZERA Petr, MOŠNER Ladislav, NOISEUX Cédric, MONTEIRO Joao, NOVOTNÝ Ondřej, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna, SLAVÍČEK Josef, STAFYLAKIS Themos, ST-CHARLES Pierre-Luc, WANG Shuai and ZEINALI Hossein. Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge. In: Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop. Tokyo: International Speech Communication Association, 2020, pp. 289-295. ISSN 2312-2846. Detail
BURGET Lukáš, GLEMBEK Ondřej, LOZANO Díez Alicia, MATĚJKA Pavel, NOVOTNÝ Ondřej, PLCHOT Oldřich, PULUGUNDLA Bhargav, ROHDIN Johan A., SILNOVA Anna and VESELÝ Karel. BUT System Description to SdSV Challenge 2020. In: Proceedings of Short-duration Speaker Verification Challenge 2020 Workshop. Shanghai, on-line event of Interspeech 2020 Conference, 2020, pp. 1-5. Detail
LOZANO Díez Alicia, SILNOVA Anna, PULUGUNDLA Bhargav, ROHDIN Johan A., VESELÝ Karel, BURGET Lukáš, PLCHOT Oldřich, GLEMBEK Ondřej, NOVOTNÝ Ondřej and MATĚJKA Pavel. BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Shanghai: International Speech Communication Association, 2020, pp. 761-765. ISSN 1990-9772. Detail