Publication Details

HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification

ZEINALI Hossein, SAMETI Hossein and BURGET Lukáš. HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 25, no. 7, 2017, pp. 1421-1435. ISSN 2329-9290. Available from: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7902120
Czech title
Extraktor i-vektorů pro ověřování mluvčího závislé na textu založený na HMM a nezávislý na promluvě
Type
journal article
Language
english
Authors
Zeinali Hossein (AUT.IR)
Sameti Hossein (SHARIF)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

Bottleneck features, DNN, hidden Markov model (HMM), i-vector, text-dependent speaker verification.

Abstract

This article is describes a new HMM structure for text-dependent speaker verification, which enabled the authors to use the potential of the HMM to model time sequences along with the established i-vector technique.

Annotation

Abstract-The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work,we propose a straightforward hiddenMarkovmodel (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simpleMFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.

Published
2017
Pages
1421-1435
Journal
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 25, no. 7, ISSN 2329-9290
Publisher
IEEE Signal Processing Society
DOI
UT WoS
000403311100002
EID Scopus
BibTeX
@ARTICLE{FITPUB11466,
   author = "Hossein Zeinali and Hossein Sameti and Luk\'{a}\v{s} Burget",
   title = "HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification",
   pages = "1421--1435",
   journal = "IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING",
   volume = 25,
   number = 7,
   year = 2017,
   ISSN = "2329-9290",
   doi = "10.1109/TASLP.2017.2694708",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11466"
}
Back to top