Publication Details

Utilizing VOiCES dataset for multichannel speaker verification with beamforming

MOŠNER Ladislav, PLCHOT Oldřich, ROHDIN Johan A. and ČERNOCKÝ Jan. Utilizing VOiCES dataset for multichannel speaker verification with beamforming. In: Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop. Tokyo: International Speech Communication Association, 2020, pp. 187-193. ISSN 2312-2846. Available from: https://www.isca-speech.org/archive/Odyssey_2020/abstracts/80.html

Czech title

Využití datasetu VOiCES pro multikanálové ověřování řečníka se směrováním akustického paprsku

Type

conference paper

Language

english

Authors

Mošner Ladislav, Ing. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Rohdin Johan A., Dr. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

multichannel speaker verification, application-aware beamforming

Abstract

VOiCES from a Distance Challenge 2019 aimed at the evaluation of speaker verification (SV) systems using single-channel trials based on the Voices Obscured in Complex Environmental Settings (VOiCES) corpus. Since it comprises recordings of the same utterances captured simultaneously by multiple microphones in the same environments, it is also suitable for multichannel experiments. In this work, we design a multichannel dataset as well as development and evaluation trials for SV inspired by the VOiCES challenge. Alternatives discarding harmful microphones are presented as well. We asses the utilization of the created dataset for x-vector based SV with beamforming as a front end. Standard fixed beamforming and NN-supported beamforming using simulated data and ideal binary masks (IBM) are compared with another variant of NNsupported beamforming that is trained solely on the VOiCES data. Lack of data revealed by experiments with VOiCESdata trained beamformer was tackled by means of a variant of SpecAugment applied to magnitude spectra. This approach led to as much as 10% relative improvement in EER pushing results closer to those obtained by a good beamformer based on IBMs.

Published

2020

Pages

187-193

Journal

Proceedings of Odyssey: The Speaker and Language Recognition Workshop, vol. 2020, no. 11, ISSN 2312-2846

Proceedings

Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop

Conference

Odyssey 2020: The Speaker and Language Recognition Workshop, Tokyo, JP

Publisher

International Speech Communication Association

Place

Tokyo, JP

DOI

10.21437/Odyssey.2020-27

BibTeX

@INPROCEEDINGS{FITPUB12289,
   author = "Ladislav Mo\v{s}ner and Old\v{r}ich Plchot and A. Johan Rohdin and Jan \v{C}ernock\'{y}",
   title = "Utilizing VOiCES dataset for multichannel speaker verification with beamforming",
   pages = "187--193",
   booktitle = "Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop",
   journal = "Proceedings of Odyssey: The Speaker and Language Recognition Workshop",
   volume = 2020,
   number = 11,
   year = 2020,
   location = "Tokyo, JP",
   publisher = "International Speech Communication Association",
   ISSN = "2312-2846",
   doi = "10.21437/Odyssey.2020-27",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12289"
}