Publication Details

Semi-supervised DNN training with word selection for ASR

VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 3687-3691. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF

Czech title

Částečně kontrolované trénování DNN s výběrem slov pro ASR

Type

conference paper

Language

english

Authors

Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

semi-supervised training, DNN, word selection, granularity of confidences

Abstract

The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR).

Annotation

Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.

Published

2017

Pages

3687-3691

Journal

Proceedings of Interspeech - on-line, vol. 2017, no. 8, ISSN 1990-9772

Proceedings

Proceedings of Interspeech 2017

Conference

Interspeech Conference, Stockholm, SE

Publisher

International Speech Communication Association

Place

Stockholm, SE

DOI

10.21437/Interspeech.2017-1385

UT WoS

000457505000766

EID Scopus

2-s2.0-85039170370

BibTeX

@INPROCEEDINGS{FITPUB11584,
   author = "Karel Vesel\'{y} and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Semi-supervised DNN training with word selection for ASR",
   pages = "3687--3691",
   booktitle = "Proceedings of Interspeech 2017",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2017,
   number = 08,
   year = 2017,
   location = "Stockholm, SE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2017-1385",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11584"
}