Publication Details
BUT Neural Network Features for Spontaneous Vietnamese in BABEL
Grézl František, Ing., Ph.D. (DCGM FIT BUT)
Hannemann Mirko, Dipl.-Ing. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
speech recognition, discriminative training, bottleneck neural networks, adaptation of neural networks, regiondependent transforms
The paper deals with multiple facets of NN feature extraction training. Not surprisingly, we found that data preparation is crucial for the success of NN training. In case we dispose of data from other (well represented) languages, we should go for it as we have shown that multilingual fine-tuning outperforms unsupervised training.
This paper presents our work on speech recognition of Vietnamese spontaneous telephone conversations. It focuses on feature extraction by Stacked Bottle-Neck neural networks: several improvements such as semi-supervised training on untranscribed data, increasing of precision of state targets, and CMLLR adaptations were investigated. We have also tested speaker adaptive training of this architecture and significant gain was found. The results are reported on BABEL Vietnamese data.
@INPROCEEDINGS{FITPUB10554, author = "Martin Karafi\'{a}t and Franti\v{s}ek Gr\'{e}zl and Mirko Hannemann and Jan \v{C}ernock\'{y}", title = "BUT Neural Network Features for Spontaneous Vietnamese in BABEL", pages = "5659--5663", booktitle = "Proceedings of ICASSP 2014", year = 2014, location = "Florencie, IT", publisher = "IEEE Signal Processing Society", ISBN = "978-1-4799-2892-7", doi = "10.1109/ICASSP.2014.6854679", language = "english", url = "https://www.fit.vut.cz/research/publication/10554" }