Publication Details

Improving Speaker Verification with Self-Pretrained Transformer Models

PENG Junyi, PLCHOT Oldřich, STAFYLAKIS Themos, MOŠNER Ladislav, BURGET Lukáš and ČERNOCKÝ Jan. Improving Speaker Verification with Self-Pretrained Transformer Models. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 5361-5365. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf
Czech title
Zlepšení ověřování mluvčího pomocí samoučících se modelů typu Transformer
Type
conference paper
Language
english
Authors
Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Stafylakis Themos (OMILIA)
Mošner Ladislav, Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

speaker verification, pre-trained speech transformer model, pre-training,

Abstract

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has received a rising interest. Despite their success, it is still challenging to disentangle the benefits of large-scale datasets and Transformer structures from the limitations of the pre-training. In this paper, we introduce a hierarchical training approach, named self-pretraining, in which Transformer models are pretrained and finetuned on the same dataset. Three pre-trained models including HuBERT, Conformer andWavLM are evaluated on four different speaker verification datasets with varying sizes. Our experiments show that these self-pretrained models achieve competitive performance on downstream speaker verification tasks with only one-third of the data compared to Librispeech pretraining, such as Vox- Celeb1 and CNCeleb1. Furthermore, when pre-training only on the VoxCeleb2-dev, the Conformer model outperforms the one pre-trained on 94k hours of data using the same fine-tuning settings.

Published
2023
Pages
5361-5365
Journal
Proceedings of Interspeech - on-line, vol. 2023, no. 8, ISSN 1990-9772
Proceedings
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
Interspeech Conference, Dublin, IE
Publisher
International Speech Communication Association
Place
Dublin, IE
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB13112,
   author = "Junyi Peng and Old\v{r}ich Plchot and Themos Stafylakis and Ladislav Mo\v{s}ner and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Improving Speaker Verification with Self-Pretrained Transformer Models",
   pages = "5361--5365",
   booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2023,
   number = 08,
   year = 2023,
   location = "Dublin, IE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2023-453",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13112"
}
Back to top