Publication Details

Investigation of Specaugment for Deep Speaker Embedding Learning

WANG Shuai, ROHDIN Johan A., PLCHOT Oldřich, BURGET Lukáš, YU Kai and ČERNOCKÝ Jan. Investigation of Specaugment for Deep Speaker Embedding Learning. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Barcelona: IEEE Signal Processing Society, 2020, pp. 7139-7143. ISBN 978-1-5090-6631-5. Available from: https://ieeexplore.ieee.org/document/9053481/authors#authors

Czech title

Výzkum metody Specaugment pro hluboké učení embeddingů mluvčích

Type

conference paper

Language

english

Authors

Wang Shuai (DCGM FIT BUT)
Rohdin Johan A., Dr. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Yu Kai (SJTU)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

speaker embedding, on-the-fly data augmentation, speaker verification, specaugment

Abstract

SpecAugment is a newly proposed data augmentation method for speech recognition. By randomly masking bands in the log Mel spectogram this method leads to impressive performance improvements. In this paper, we investigate the usage of SpecAugment for speaker verification tasks. Two different models, namely 1-D convolutional TDNN and 2-D convolutional ResNet34, trained with either Softmax or AAM-Softmax loss, are used to analyze SpecAugments effectiveness. Experiments are carried out on the Voxceleb and NIST SRE 2016 dataset. By applying SpecAugment to the original clean data in an on-the-fly manner without complex off-line data augmentation methods, we obtained 3.72% and 11.49% EER for NIST SRE 2016 Cantonese and Tagalog, respectively. For Voxceleb1 evaluation set, we obtained 1.47% EER.

Published

2020

Pages

7139-7143

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), Barcelona, ES

ISBN

978-1-5090-6631-5

Publisher

IEEE Signal Processing Society

Place

Barcelona, ES

DOI

10.1109/ICASSP40776.2020.9053481

UT WoS

000615970407081

EID Scopus

2-s2.0-85089236385

BibTeX

@INPROCEEDINGS{FITPUB12278,
   author = "Shuai Wang and A. Johan Rohdin and Old\v{r}ich Plchot and Luk\'{a}\v{s} Burget and Kai Yu and Jan \v{C}ernock\'{y}",
   title = "Investigation of Specaugment for Deep Speaker Embedding Learning",
   pages = "7139--7143",
   booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
   year = 2020,
   location = "Barcelona, ES",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-5090-6631-5",
   doi = "10.1109/ICASSP40776.2020.9053481",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12278"
}