Publication Details
Sequence-discriminative Training of Deep Neural Networks
Ghoshal Arnab (UEDIN)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Povey Daniel (JHU)
speech recognition, deep learning, sequencecriterion training, neural networks, reproducible research
This article presents experiments with DNN-HMM hybrid systems trained using frame-based cross-entropy and different sequence-discriminative criteria on the 300 hour Switchboard conversational telephone speech task.
Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria-maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI - are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria - lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results.
@INPROCEEDINGS{FITPUB10422, author = "Karel Vesel\'{y} and Arnab Ghoshal and Luk\'{a}\v{s} Burget and Daniel Povey", title = "Sequence-discriminative Training of Deep Neural Networks", pages = "2345--2349", booktitle = "Proceedings of Interspeech 2013", journal = "Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).", number = 8, year = 2013, location = "Lyon, FR", publisher = "International Speech Communication Association", ISBN = "978-1-62993-443-3", ISSN = "2308-457X", language = "english", url = "https://www.fit.vut.cz/research/publication/10422" }