Publication Details
Boosting Performance on Low-resource Languages by Standard Corpora: AN ANALYSIS
DNN topology, Stacked Bottle-neck, feature extraction, multilingual training, system porting, low resource
In this paper, we have evaluated the multilingual techniques for single source-language scenario. Since it is hard to obtain coherent multilingual corpora usable for multilingual training, using single, well resourced, language instead is quite attractive.
In this paper, we analyze the feasibility of using single wellresourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail both is positive. A greater effect is observed for increasing number of tied states. The modified neural network structure, shown useful for multilingual porting, was also evaluated with its specific porting procedure. Using original NN structure in combination with modified porting adapt-adapt strategy was fount as best. It achieves relative improvement 3.5-8.8% on variety of target languages. These results are comparable with using multilingual NNs pretrained on 7 languages.
@INPROCEEDINGS{FITPUB11311, author = "Franti\v{s}ek Gr\'{e}zl and Martin Karafi\'{a}t", title = "Boosting Performance on Low-resource Languages by Standard Corpora: AN ANALYSIS", pages = "629--636", booktitle = "Proceeding of SLT 2016", year = 2016, location = "San Diego, US", publisher = "IEEE Signal Processing Society", ISBN = "978-1-5090-4903-5", doi = "10.1109/SLT.2016.7846329", language = "english", url = "https://www.fit.vut.cz/research/publication/11311" }