Result Details

Sequence Summarizing Neural Network for Speaker Adaptation

VESELÝ, K.; WATANABE, S.; ŽMOLÍKOVÁ, K.; KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J. Sequence Summarizing Neural Network for Speaker Adaptation. In Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016. Shanghai: IEEE Signal Processing Society, 2016. p. 5315-5319. ISBN: 978-1-4799-9988-0.

Type

conference paper

Language

English

Authors

Veselý Karel, Ing., Ph.D., FIT (FIT), DCGM (FIT)
Watanabe Shinji, FIT (FIT)
Žmolíková Kateřina, Ing., Ph.D., FIT (FIT)
Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a "summary vector", representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frameclassification training. Moreover, appending both the i-vector and "summary vector" to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

Keywords

DNN, adaptation, i-vector, sequence summary,SSNN

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2016/vesely… PDF

Annotation

In this paper, we proposed an alternative method to produce DNN adaptation vectors similar to i-vectors. The vectors are computed by the Sequence Summarizing Neural Network and characterize the acoustics in an utterance.

Published

2016

Pages

5315–5319

Proceedings

Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016

Conference

41th IEEE International Conference on Acoustics, Speech and Signal Processing

ISBN

978-1-4799-9988-0

Publisher

IEEE Signal Processing Society

Place

Shanghai

DOI

10.1109/ICASSP.2016.7472692

UT WoS

000388373405093

EID Scopus

2-s2.0-84973294668

BibTeX

@inproceedings{BUT130964,
  author="Karel {Veselý} and Shinji {Watanabe} and Kateřina {Žmolíková} and Martin {Karafiát} and Lukáš {Burget} and Jan {Černocký}",
  title="Sequence Summarizing Neural Network for Speaker Adaptation",
  booktitle="Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016",
  year="2016",
  pages="5315--5319",
  publisher="IEEE Signal Processing Society",
  address="Shanghai",
  doi="10.1109/ICASSP.2016.7472692",
  isbn="978-1-4799-9988-0",
  url="https://www.fit.vut.cz/research/publication/11145/"
}

Files

pdf vesely_icassp2016_0005315.pdf 178 kB

Projects

Information mining in speech acquired by distant microphones, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, start: 2015-10-01, end: 2020-09-30, completed
Meeting Assistant (MINT), TAČR, Program aplikovaného výzkumu a experimentálního vývoje ALFA, TA04011311, start: 2014-10-01, end: 2017-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)