Publication Details

Sequence Summarizing Neural Network for Speaker Adaptation

VESELÝ Karel, WATANABE Shinji, ŽMOLÍKOVÁ Kateřina, KARAFIÁT Martin, BURGET Lukáš and ČERNOCKÝ Jan. Sequence Summarizing Neural Network for Speaker Adaptation. In: Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016. Shanghai: IEEE Signal Processing Society, 2016, pp. 5315-5319. ISBN 978-1-4799-9988-0.
Czech title
Neuronové sítě shrnující sekvence pro adaptaci na mluvčího
Type
conference paper
Language
english
Authors
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Watanabe Shinji, Dr. (JHU)
Žmolíková Kateřina, Ing., Ph.D. (DCGM FIT BUT)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

DNN, adaptation, i-vector, sequence summary, SSNN

Abstract

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a "summary vector", representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frameclassification training. Moreover, appending both the i-vector and "summary vector" to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

Annotation

In this paper, we proposed an alternative method to produce DNN adaptation vectors similar to i-vectors. The vectors are computed by the Sequence Summarizing Neural Network and characterize the acoustics in an utterance.

Published
2016
Pages
5315-5319
Proceedings
Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016
Conference
41th IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, CN
ISBN
978-1-4799-9988-0
Publisher
IEEE Signal Processing Society
Place
Shanghai, CN
DOI
UT WoS
000388373405093
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB11145,
   author = "Karel Vesel\'{y} and Shinji Watanabe and Kate\v{r}ina \v{Z}mol\'{i}kov\'{a} and Martin Karafi\'{a}t and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Sequence Summarizing Neural Network for Speaker Adaptation",
   pages = "5315--5319",
   booktitle = "Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016",
   year = 2016,
   location = "Shanghai, CN",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4799-9988-0",
   doi = "10.1109/ICASSP.2016.7472692",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11145"
}
Files
Back to top