Publication Details
Semi-supervised DNN training with word selection for ASR
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
semi-supervised training, DNN, word selection, granularity of confidences
The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR).
Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.
@INPROCEEDINGS{FITPUB11584, author = "Karel Vesel\'{y} and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}", title = "Semi-supervised DNN training with word selection for ASR", pages = "3687--3691", booktitle = "Proceedings of Interspeech 2017", journal = "Proceedings of Interspeech - on-line", volume = 2017, number = 08, year = 2017, location = "Stockholm, SE", publisher = "International Speech Communication Association", ISSN = "1990-9772", doi = "10.21437/Interspeech.2017-1385", language = "english", url = "https://www.fit.vut.cz/research/publication/11584" }