Publication Details

EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION

MOTLÍČEK Petr, DEY Subhadeep, MADIKERI Srikanth and BURGET Lukáš. Employment of Subspace Gaussian Mixture Models in Speaker Recognition. In: Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, Queensland: IEEE Signal Processing Society, 2015, pp. 4445-4449. ISBN 978-1-4673-6997-8. Available from: https://ieeexplore.ieee.org/document/7178811
Czech title
Využití podprostorových modelů Gaussovských směsí pro rozpoznávání mluvčího
Type
conference paper
Language
english
Authors
Motlíček Petr, Ing., Ph.D. (IDIAP)
Dey Subhadeep (IDIAP)
Madikeri Srikanth (IDIAP)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

speaker recognition, i-vectors, subspace Gaussian mixture models, automatic speech recognition

Abstract

This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension to the basic SGMM framework allows to robustly estimate low-dimensional speaker vectors and exploit them for speaker adaptation. We propose a speaker verification framework based on low-dimensional speaker vectors estimated using SGMMs, trained in ASR manner using manual transcriptions. To test the robustness of the system, we evaluate the proposed approach with respect to the state-of-the-art i-vector extractor on the NIST SRE 2010 evaluation set and on four different length-utterance conditions: 3sec-10sec, 10 sec-30 sec, 30 sec-60 sec and full (untruncated) utterances. Experimental results reveal that while i-vector system performs better on truncated 3sec to 10sec and 10 sec to 30 sec utterances, noticeable improvements are observed with SGMMs especially on full length-utterance durations. Eventually, the proposed SGMM approach exhibits complementary properties and can thus be efficiently fused with i-vector based speaker verification system.

Annotation

This paper presents Subspace Gaussian Mixture Model (SGMM) approach employed as a probabilistic generative model to estimate speaker vector representations to be subsequently used in the speaker verification task. SGMMs have already been shown to significantly outperform traditional HMM/GMMs in Automatic Speech Recognition (ASR) applications. An extension to the basic SGMM framework allows to robustly estimate low-dimensional speaker vectors and exploit them for speaker adaptation. We propose a speaker verification framework based on low-dimensional speaker vectors estimated using SGMMs, trained in ASR manner using manual transcriptions. To test the robustness of the system, we evaluate the proposed approach with respect to the state-of-the-art i-vector extractor on the NIST SRE 2010 evaluation set and on four different length-utterance conditions: 3sec-10sec, 10 sec-30 sec, 30 sec-60 sec and full (untruncated) utterances. Experimental results reveal that while i-vector system performs better on truncated 3sec to 10sec and 10 sec to 30 sec utterances, noticeable improvements are observed with SGMMs especially on full length-utterance durations. Eventually, the proposed SGMM approach exhibits complementary properties and can thus be efficiently fused with i-vector based speaker verification system.

Published
2015
Pages
4445-4449
Proceedings
Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing
Conference
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), Brisbane, AU
ISBN
978-1-4673-6997-8
Publisher
IEEE Signal Processing Society
Place
South Brisbane, Queensland, AU
DOI
UT WoS
000427402904111
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB10952,
   author = "Petr Motl\'{i}\v{c}ek and Subhadeep Dey and Srikanth Madikeri and Luk\'{a}\v{s} Burget",
   title = "EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION",
   pages = "4445--4449",
   booktitle = "Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing",
   year = 2015,
   location = "South Brisbane, Queensland, AU",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4673-6997-8",
   doi = "10.1109/ICASSP.2015.7178811",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10952"
}
Back to top