Publication Details
HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification
Bottleneck features, DNN, hidden Markov model (HMM), i-vector, text-dependent speaker verification.
This article is describes a new HMM structure for text-dependent speaker verification, which enabled the authors to use the potential of the HMM to model time sequences along with the established i-vector technique.
Abstract-The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work,we propose a straightforward hiddenMarkovmodel (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simpleMFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.
@ARTICLE{FITPUB11466, author = "Hossein Zeinali and Hossein Sameti and Luk\'{a}\v{s} Burget", title = "HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification", pages = "1421--1435", journal = "IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING", volume = 25, number = 7, year = 2017, ISSN = "2329-9290", doi = "10.1109/TASLP.2017.2694708", language = "english", url = "https://www.fit.vut.cz/research/publication/11466" }