Publication Details

End-to-end DNN based text-independent speaker recognition for long and short utterances

ROHDIN Johan A., SILNOVA Anna, DIEZ Sánchez Mireia, PLCHOT Oldřich, MATĚJKA Pavel, BURGET Lukáš and GLEMBEK Ondřej. End-to-end DNN based text-independent speaker recognition for long and short utterances. Computer Speech and Language, vol. 2020, no. 59, pp. 22-35. ISSN 0885-2308. Available from: https://www.sciencedirect.com/science/article/pii/S0885230818303632

Czech title

Rozpoznávání mluvčího závislé na textu založené na End-to-end DNN přístupu pro dlouhé a krátké promluvy

Type

journal article

Language

english

Authors

Rohdin Johan A., Dr. (DCGM FIT BUT)
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)

URL

Keywords

Speaker verification, DNN, End-to-end, Text-independent, i-vector, PLDA

Abstract

Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we present an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of end-to-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Published

2020

Pages

22-35

Journal

Computer Speech and Language, vol. 2020, no. 59, ISSN 0885-2308

Publisher

Elsevier Science

DOI

10.1016/j.csl.2019.06.002

UT WoS

000490540900002

EID Scopus

2-s2.0-85067618095

BibTeX

@ARTICLE{FITPUB12038,
   author = "A. Johan Rohdin and Anna Silnova and Mireia S\'{a}nchez Diez and Old\v{r}ich Plchot and Pavel Mat\v{e}jka and Luk\'{a}\v{s} Burget and Ond\v{r}ej Glembek",
   title = "End-to-end DNN based text-independent speaker recognition for long and short utterances",
   pages = "22--35",
   journal = "Computer Speech and Language",
   volume = 2020,
   number = 59,
   year = 2020,
   ISSN = "0885-2308",
   doi = "10.1016/j.csl.2019.06.002",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12038"
}