Publication Details
Bottle-Neck Feature Extraction Structures for Multilingual Training and Porting
DNN topology; Stacked Bottle-Neck; feature extraction; multilingual training; system porting
This article describes the Bottle-Neck feature extraction structures for multilingual training and porting.
Stacked-Bottle-Neck (SBN) feature extraction is a crucial part of modern automatic speech recognition (ASR) systems. The SBN network traditionally contains a hidden layer between the BN and output layers. Recently, we have observed that an SBN architecture without this hidden layer (i.e. direct BN-layer - output-layer connection) performs better for a single language but fails in scenarios where a network pre-trained in multilingual fashion is ported to a target language. In this paper, we describe two strategies allowing the direct-connection SBN network to indeed benefit from pre-training with a multilingual net: (1) pre-training multilingual net with the hidden layer which is discarded before porting to the target language and (2) using only the the direct- connection SBN with triphone targets both in multilingual pre-training and porting to the target language. The results are reported on IARPA-BABEL limited language pack (LLP) data.
@INPROCEEDINGS{FITPUB11182, author = "Franti\v{s}ek Gr\'{e}zl and Martin Karafi\'{a}t", title = "Bottle-Neck Feature Extraction Structures for Multilingual Training and Porting", pages = "144--151", booktitle = "Procedia Computer Science", journal = "Procedia Computer Science", volume = 2016, number = 81, year = 2016, location = "Yogyakarta, ID", publisher = "Elsevier Science", ISSN = "1877-0509", doi = "10.1016/j.procs.2016.04.042", language = "english", url = "https://www.fit.vut.cz/research/publication/11182" }