Publication Details

Semi-supervised DNN training with word selection for ASR

VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 3687-3691. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF
Czech title
Částečně kontrolované trénování DNN s výběrem slov pro ASR
Type
conference paper
Language
english
Authors
URL
Keywords

semi-supervised training, DNN, word selection, granularity of confidences

Abstract

The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR).

Annotation

Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.

Published
2017
Pages
3687-3691
Journal
Proceedings of Interspeech - on-line, vol. 2017, no. 8, ISSN 1990-9772
Proceedings
Proceedings of Interspeech 2017
Conference
Interspeech Conference, Stockholm, SE
Publisher
International Speech Communication Association
Place
Stockholm, SE
DOI
UT WoS
000457505000766
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB11584,
   author = "Karel Vesel\'{y} and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Semi-supervised DNN training with word selection for ASR",
   pages = "3687--3691",
   booktitle = "Proceedings of Interspeech 2017",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2017,
   number = 08,
   year = 2017,
   location = "Stockholm, SE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2017-1385",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11584"
}
Back to top