Thesis Details
Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition
This thesis deals with probabilistic models for automatic speaker verification. In particular, the Probabilistic Linear Discriminant Analysis (PLDA) model, which models i--vector representation of speech utterances, is analyzed in detail. The thesis proposes extensions to the standard state-of-the-art PLDA model. The newly proposed Full Posterior Distribution PLDA models the uncertainty associated with the i--vectorgeneration process. A new discriminative approach to training the speaker verification system based on the~PLDA model is also proposed.
When comparing the original PLDA with the model extended by considering the i--vector uncertainty, results obtained with the extended model show up to 20% relative improvement on tests with short segments of speech. As the test segments get longer (more than one minute), the performance gain of the extended model is lower, but it is never worse than the baseline. Training data are, however, usually available in the form of segments which are sufficiently long and therefore, in such cases, there is no gain from using the extended model for training. Instead, the training can be performed with the original PLDA model and the extended model can be used if the task is to test on the short segments.
The discriminative classifier is based on classifying pairs of i--vectors into two classes representing target and non-target trials. The functional form for obtaining the score for every i--vector pair is derived from the PLDA model and training is based on the logistic regression minimizing the cross-entropy error function between the correct labeling of all trials and the probabilistic labeling proposed by the system. The results obtained with discriminatively trained system are similar to those obtained with generative baseline, but the discriminative approach shows the ability to output better calibrated scores. This property leads to a better actual verification performance on an unseen evaluation set, which is an important feature for real use scenarios.
Speaker Recognition, Gaussian Mixture Model, Subspace Modeling, i--vector, Probabilistic Linear Discriminant Analysis, Discriminative Training
@phdthesis{FITPT347, author = "Old\v{r}ich Plchot", type = "Ph.D. thesis", title = "Extensions to Probabilistic Linear Discriminant Analysis for Speaker Recognition", school = "Brno University of Technology, Faculty of Information Technology", year = 2014, location = "Brno, CZ", language = "english", url = "https://www.fit.vut.cz/study/phd-thesis/347/" }