Publication Details

Subspace Gaussian mixture models for speech recognition

POVEY Daniel, BURGET Lukáš, AGARWAL Mohit, AKYAZI Pinar, FENG Kai, GHOSHAL Arnab, GLEMBEK Ondřej, GOEL Nagendra K., KARAFIÁT Martin, RASTROW Ariya, ROSE Richard, SCHWARZ Petr and THOMAS Samuel. Subspace Gaussian mixture models for speech recognition. In: Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010, pp. 4330-4333. ISBN 978-1-4244-4296-6. ISSN 1520-6149.
Czech title
Sub-space gaussovské modely pro rozpoznávání řeči
Type
conference paper
Language
english
Authors
Povey Daniel (JHU)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Agarwal Mohit (IIIT)
Akyazi Pinar (UBOGAZ)
Feng Kai (HKUST)
Ghoshal Arnab (UEDIN)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Goel Nagendra K. (GOVIVACE)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Rastrow Ariya (JHU)
Rose Richard (MCGILL)
Schwarz Petr, Ing., Ph.D. (DCGM FIT BUT)
Thomas Samuel (JHU)
URL
Keywords

Speech Recognition, Hidden Markov Models, Gaussian Mixture Models

Abstract

The paper is on subspace Gaussian mixture models for speech recognition. We describe an acoustic modeling approach in which all phonetic states share a common GMM structure.

Annotation

We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.

Published
2010
Pages
4330-4333
Journal
Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 2010, no. 3, ISSN 1520-6149
Proceedings
Proc. International Conference on Acoustics, Speech, and Signal Processing
Conference
International Conference on Acoustics, Speech, and Signal Processing 2010, Dallas, US
ISBN
978-1-4244-4296-6
Publisher
IEEE Signal Processing Society
Place
Dallas, US
BibTeX
@INPROCEEDINGS{FITPUB9311,
   author = "Daniel Povey and Luk\'{a}\v{s} Burget and Mohit Agarwal and Pinar Akyazi and Kai Feng and Arnab Ghoshal and Ond\v{r}ej Glembek and K. Nagendra Goel and Martin Karafi\'{a}t and Ariya Rastrow and Richard Rose and Petr Schwarz and Samuel Thomas",
   title = "Subspace Gaussian mixture models for speech recognition",
   pages = "4330--4333",
   booktitle = "Proc. International Conference on Acoustics, Speech, and Signal Processing",
   journal = "Proc. International Conference on Acoustics, Speech, and Signal Processing",
   volume = 2010,
   number = 3,
   year = 2010,
   location = "Dallas, US",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4244-4296-6",
   ISSN = "1520-6149",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/9311"
}
Back to top