Publication Details

Learnable Sparse Filterbank for Speaker Verification

PENG Junyi, GU Rongzhi, MOŠNER Ladislav, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Learnable Sparse Filterbank for Speaker Verification. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 5110-5114. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/pdfs/interspeech_2022/peng22e_interspeech.pdf

Czech title

Naučitelná řídká banka filtrů pro ověřování mluvčích

Type

conference paper

Language

english

Authors

Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Gu Rongzhi (PKUSZ)
Mošner Ladislav, Ing. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

learnable filter, sparse filtering, sparsity, speaker verification

Abstract

Recently, feature extraction with learnable filters was extensively investigated with speaker verification systems, with filters learned both in time- and frequency-domains. Most of the learned schemes however end up with filters close to their initialization (e.g. Mel filterbank) or filters strongly limited by their constraints. In this paper, we propose a novel learnable sparse filterbank, named LearnSF, by exclusively optimizing the sparsity of the filterbank, that does not explicitly constrain the filters to follow pre-defined distribution. After standard pre-processing (STFT and square of the magnitude spectrum), the learnable sparse filterbank is employed, with its normalized outputs fed into a neural network predicting the speaker identity. We evaluated the performance of the proposed approach on both VoxCeleb and CNCeleb datasets. The experimental results demonstrate the effectiveness of the proposed LearnSF compared to both widely-used acoustic features and existing parameterized learnable front-ends.

Published

2022

Pages

5110-5114

Journal

Proceedings of Interspeech - on-line, no. 9, ISSN 1990-9772

Proceedings

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference

Interspeech Conference, Incheon, KR

Publisher

International Speech Communication Association

Place

Incheon, KR

DOI

10.21437/Interspeech.2022-11309

UT WoS

000900724505058

EID Scopus

2-s2.0-85140077879

BibTeX

@INPROCEEDINGS{FITPUB12851,
   author = "Junyi Peng and Rongzhi Gu and Ladislav Mo\v{s}ner and Old\v{r}ich Plchot and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Learnable Sparse Filterbank for Speaker Verification",
   pages = "5110--5114",
   booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
   journal = "Proceedings of Interspeech - on-line",
   number = 9,
   year = 2022,
   location = "Incheon, KR",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2022-11309",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12851"
}