Publication Details
End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Speaker verification, DNN, end-to-end
Recently, several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto- end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.
@INPROCEEDINGS{FITPUB11724, author = "A. Johan Rohdin and Anna Silnova and Mireia S\'{a}nchez Diez and Old\v{r}ich Plchot and Pavel Mat\v{e}jka and Luk\'{a}\v{s} Burget", title = "End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA", pages = "4874--4878", booktitle = "Proceedings of ICASSP", year = 2018, location = "Calgary, CA", publisher = "IEEE Signal Processing Society", ISBN = "978-1-5386-4658-8", doi = "10.1109/ICASSP.2018.8461958", language = "english", url = "https://www.fit.vut.cz/research/publication/11724" }