Publication Details
BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Pulugundla Bhargav, M.Sc. (DCGM FIT BUT)
Rohdin Johan A., Dr. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Novotný Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
text-dependent speaker verification, phrasedependent PLDA, phrase recognizer
In this paper, we present the winning BUT submission for the text-dependent task of the SdSV challenge 2020. Given the large amount of training data available in this challenge, we explore successful techniques from text-independent systems in the text-dependent scenario. In particular, we trained x-vector extractors on both in-domain and out-of-domain datasets and combine them with i-vectors trained on concatenated MFCCs and bottleneck features, which have proven effective for the text-dependent scenario. Moreover, we proposed the use of phrase-dependent PLDA backend for scoring and its combination with a simple phrase recognizer, which brings up to 63% relative improvement on our development set with respect to using standard PLDA. Finally, we combine our different i-vector and x-vector based systems using a simple linear logistic regression score level fusion, which provides 28% relative improvement on the evaluation set with respect to our best single system.
@INPROCEEDINGS{FITPUB12378, author = "Alicia D\'{i}ez Lozano and Anna Silnova and Bhargav Pulugundla and A. Johan Rohdin and Karel Vesel\'{y} and Luk\'{a}\v{s} Burget and Old\v{r}ich Plchot and Ond\v{r}ej Glembek and Ond\v{r}ej Novotn\'{y} and Pavel Mat\v{e}jka", title = "BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020", pages = "761--765", booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH", journal = "Proceedings of Interspeech - on-line", volume = 2020, number = 10, year = 2020, location = "Shanghai, CN", publisher = "International Speech Communication Association", ISSN = "1990-9772", doi = "10.21437/Interspeech.2020-2882", language = "english", url = "https://www.fit.vut.cz/research/publication/12378" }