Publication Details
Improved Feature Processing for Deep Neural Networks
Povey Daniel (JHU)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
speech recognition, speaker recognition, neural networks, speaker adaptation
In this paper, we explore various methods of providing higherdimensional features to DNNs, while still applying speaker adaptation with fMLLR of low dimensionality.
In this paper, we investigate alternative ways of processing MFCC-based features to use as the input to Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing the 13-dimensional front-end MFCCs across 9 frames, followed by applying LDA to reduce the dimension to 40 and then further decorrelation using MLLT. Confirming the results of other groups, we show that speaker adaptation applied on the top of these features using feature-space MLLR is helpful. The fact that the number of parameters of a DNN is not strongly sensitive to the input feature dimension (unlike GMM-based systems) motivated us to investigate ways to increase the dimension of the features. In this paper, we investigate several approaches to derive higher-dimensional features and verify their performance with DNN. Our best result is obtained from splicing our baseline 40-dimensional speaker adapted features again across 9 frames, followed by reducing the dimension to 200 or 300 using another LDA. Our final result is about 3% absolute better than our best GMM system, which is a discriminatively trained model.
@INPROCEEDINGS{FITPUB10432, author = "P. Shakti Rath and Daniel Povey and Karel Vesel\'{y} and Jan \v{C}ernock\'{y}", title = "Improved Feature Processing for Deep Neural Networks", pages = "109--113", booktitle = "Proceedings of Interspeech 2013", journal = "Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).", number = 8, year = 2013, location = "Lyon, FR", publisher = "International Speech Communication Association", ISBN = "978-1-62993-443-3", ISSN = "2308-457X", language = "english", url = "https://www.fit.vut.cz/research/publication/10432" }