Publication Details

Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker Verification

ZEINALI Hossein, BURGET Lukáš, SAMETI Hossein, GLEMBEK Ondřej and PLCHOT Oldřich. Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker Verification. In: Proceedings of Odyssey 2016, The Speaker and Language Recognition Workshop. Bilbao: International Speech Communication Association, 2016, pp. 24-30. ISSN 2312-2846. Available from: http://www.odyssey2016.org/papers/pdfs_stamped/63.pdf
Czech title
Hluboké neuronové sítě a skryté Markovovy modely v i-vektorovém systému pro ověřování mluvčího závislém na textu
Type
conference paper
Language
english
Authors
Zeinali Hossein, Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Sameti Hossein (SHARIF)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

deep neural networks,  hidden Markov Models, i-vector-based, text-dependent, speaker verification

Abstract

This article is about deep neural networks and hidden Markov models in i-vector-based text-dependent speaker verification.

Annotation

Techniques making use of Deep Neural Networks (DNN) have recently been seen to bring large improvements in textindependent speaker recognition. In this paper, we verify that the DNN based methods result in excellent performances in the context of text-dependent speaker verification as well. We build our system on the previously introduced HMM based ivector approach, where phone models are used to obtain frame level alignment in order to collect sufficient statistics for ivector extraction. For comparison, we experiment with an alternative alignment obtained directly from the output of DNN trained for phone classification. We also experiment with DNN based bottleneck features and their combinations with standard cepstral features. Although the i-vector approach is generally considered not suitable for text-dependent speaker verification, we show that our HMM based approach combined with bottleneck features provides truly state-of-the-art performance on RSR2015 data.

Published
2016
Pages
24-30
Journal
Proceedings of Odyssey: The Speaker and Language Recognition Workshop, vol. 2016, no. 6, ISSN 2312-2846
Proceedings
Proceedings of Odyssey 2016, The Speaker and Language Recognition Workshop
Conference
Odyssey 2016, Bilbao, ES
Publisher
International Speech Communication Association
Place
Bilbao, ES
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB11220,
   author = "Hossein Zeinali and Luk\'{a}\v{s} Burget and Hossein Sameti and Ond\v{r}ej Glembek and Old\v{r}ich Plchot",
   title = "Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker Verification",
   pages = "24--30",
   booktitle = "Proceedings of Odyssey 2016, The Speaker and Language Recognition Workshop",
   journal = "Proceedings of Odyssey: The Speaker and Language Recognition Workshop",
   volume = 2016,
   number = 06,
   year = 2016,
   location = "Bilbao, ES",
   publisher = "International Speech Communication Association",
   ISSN = "2312-2846",
   doi = "10.21437/Odyssey.2016-4",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11220"
}
Back to top