Publication Details
Regularized Subspace n-Gram Model for Phonotactic iVector Extraction
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Cumani Sandro, Mgr., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
Language identification, Subspace modeling, Subspace multinomial model
This article describes an enhanced phonotactic iVector extraction model over the n-gram counts. In the first step, a subspace n-gram model is proposed to model conditional n-gram probabilities. Modeling different 3-gram histories with separated multinomial distributions shows promising results for the long condition however, we observed model over-fitting for the short duration conditions.
Phonotactic language identification (LID) by means of n-gram statistics and discriminative classifiers is a popular approach for the LID problem. Low-dimensional representation of the n-gram statistics leads to the use of more diverse and efficient machine learning techniques in the LID. Recently, we proposed phototactic iVector as a low-dimensional representation of the n-gram statistics. In this work, an enhanced modeling of the n-gram probabilities along with regularized parameter estimation is proposed. The proposed model consistently improves the LID system performance over all conditions up to 15% relative to the previous state of the art system. The new model also alleviates memory requirement of the iVector extraction and helps to speed up subspace training. Results are presented in terms of Cavg over NIST LRE2009 evaluation set.
@INPROCEEDINGS{FITPUB10449, author = "Mohammad Mehdi Soufifar and Luk\'{a}\v{s} Burget and Old\v{r}ich Plchot and Sandro Cumani and Jan \v{C}ernock\'{y}", title = "Regularized Subspace n-Gram Model for Phonotactic iVector Extraction", pages = "74--78", booktitle = "Proceedings of Interspeech 2013", journal = "Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).", number = 8, year = 2013, location = "Lyon, FR", publisher = "International Speech Communication Association", ISBN = "978-1-62993-443-3", ISSN = "2308-457X", language = "english", url = "https://www.fit.vut.cz/research/publication/10449" }