Publication Details

Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters

PENG Junyi, STAFYLAKIS Themos, GU Rongzhi, PLCHOT Oldřich, MOŠNER Ladislav, BURGET Lukáš and ČERNOCKÝ Jan. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023, pp. 1-5. ISBN 978-1-7281-6327-7. Available from: https://ieeexplore.ieee.org/document/10094795
Czech title
Parametrově efektivní přenosové učení předtrénovaných modelů typu transformer pomocí adaptérů pro úlohu ověřování mluvčích
Type
conference paper
Language
english
Authors
Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Stafylakis Themos (OMILIA)
Gu Rongzhi (PKUSZ)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Mošner Ladislav, Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

Speaker verification, pre-trained model, adapter, fine-tuning, transfer learning

Abstract

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks. However, most fine-tuning approaches update all the parameters of the pre-trained model, which becomes prohibitive as the model size grows and sometimes results in over- fitting on small datasets. In this paper, we conduct a comprehensive analysis of applying parameter-efficient transfer learning (PETL) methods to reduce the required learnable parameters for adapting to speaker verification tasks. Specifically, during the fine-tuning process, the pre-trained models are frozen, and only lightweight modules inserted in each Transformer block are trainable (a method known as adapters). Moreover, to boost the performance in a cross- language low-resource scenario, the Transformer model is further tuned on a large intermediate dataset before directly fine-tuning it on a small dataset. With updating fewer than 4% of parameters, (our proposed) PETL-based methods achieve comparable performances with full fine-tuning methods (Vox1-O: 0.55%, Vox1-E: 0.82%, Vox1-H:1.73%).

Published
2023
Pages
1-5
Proceedings
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference
2023 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Rhodes Island, Greece, GR
ISBN
978-1-7281-6327-7
Publisher
IEEE Signal Processing Society
Place
Rhodes Island, GR
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB13053,
   author = "Junyi Peng and Themos Stafylakis and Rongzhi Gu and Old\v{r}ich Plchot and Ladislav Mo\v{s}ner and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters",
   pages = "1--5",
   booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
   year = 2023,
   location = "Rhodes Island, GR",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-7281-6327-7",
   doi = "10.1109/ICASSP49357.2023.10094795",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13053"
}
Back to top