Data selection by sequence summarizing neural network in mismatch condition training

Czech title

Výběr dat pomocí sekvenční sumarizační neuronové sítě v trénování na datech z odlišných podmínek

Type

conference paper

Language

english

Authors

Žmolíková Kateřina, Ing., Ph.D. (DCGM FIT BUT)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Delcroix Marc (NTT)
Watanabe Shinji, Dr. (JHU)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

Automatic speech recognition, Data augmentation, Data selection, Mismatch training condition, Sequence summarization

Abstract

Data augmentation is a simple and efficient technique to improve the robustness of a speech recognizer when deployed in mismatched training-test conditions. Our paper proposes a new approach for selecting data with respect to similarity of acoustic conditions. The similarity is computed based on a sequence summarizing neural network which extracts vectors containing acoustic summary (e.g. noise and reverberation characteristics) of an utterance. Several configurations of this network and different methods of selecting data using these "summary-vectors" were explored. The results are reported on a mismatched condition using AMI training set with the proposed data selection and CHiME3 test set.

Annotation

Data augmentation is a simple and efficient technique to improve the robustness of a speech recognizer when deployed in mismatched training-test conditions. Our paper proposes a new approach for selecting data with respect to similarity of acoustic conditions. The similarity is computed based on a sequence summarizing neural network which extracts vectors containing acoustic summary (e.g. noise and reverberation characteristics) of an utterance. Several configurations of this network and different methods of selecting data using these "summary-vectors" were explored. The results are reported on a mismatched condition using AMI training set with the proposed data selection and CHiME3 test set.

Published

2016

Pages

2354-2358

Proceedings

Proceedings of Interspeech 2016

Conference

Interspeech Conference, San Francisco, US

ISBN

978-1-5108-3313-5

Publisher

International Speech Communication Association

Place

San Francisco, US

DOI

10.21437/Interspeech.2016-741

UT WoS

000409394401175

EID Scopus

2-s2.0-84994382229

BibTeX

@INPROCEEDINGS{FITPUB11271,
   author = "Kate\v{r}ina \v{Z}mol\'{i}kov\'{a} and Martin Karafi\'{a}t and Karel Vesel\'{y} and Marc Delcroix and Shinji Watanabe and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Data selection by sequence summarizing neural network in mismatch condition training",
   pages = "2354--2358",
   booktitle = "Proceedings of Interspeech 2016",
   year = 2016,
   location = "San Francisco, US",
   publisher = "International Speech Communication Association",
   ISBN = "978-1-5108-3313-5",
   doi = "10.21437/Interspeech.2016-741",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11271"
}