Publication Details

Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition

NOVOTNÝ Ondřej, PLCHOT Oldřich, GLEMBEK Ondřej, ČERNOCKÝ Jan and BURGET Lukáš. Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition. Computer Speech and Language, vol. 2019, no. 58, pp. 403-421. ISSN 0885-2308. Available from: https://www.sciencedirect.com/science/article/pii/S0885230818303607

Czech title

Analýza čištění signálu pomocí DNN pro robustní rozpoznávání mluvčího

Type

journal article

Language

english

Authors

Novotný Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)

URL

Keywords

Speakerverification; Signalenhancement; Autoencoder; Neuralnetwork; Robustness; Embedding

Abstract

In this work, we present an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising. Thetarget application is a robust speaker verification (SV) system. We start our approach by carefully designing a data augmentationprocess to cover a wide range of acoustic conditions and to obtain rich training data for various components of our SV system.We augment several well-known databases used in SV with artificially noised and reverberated data and we use them to train adenoising autoencoder (mapping noisy and reverberated speech to its clean version) as well as an x-vector extractor which is cur-rently considered as state-of-the-art in SV. Later, we use the autoencoder as a preprocessing step for a text-independent SV sys-tem. We compare results achieved with autoencoder enhancement, multi-condition PLDA training and their simultaneous use.We present a detailed analysis with various conditions of NIST SRE 2010, 2016, PRISM and with re-transmitted data. We con-clude that the proposed preprocessing can significantly improve both i-vector and x-vector baselines and that this technique canbe used to build a robust SV system for various target domains.

Published

2019

Pages

403-421

Journal

Computer Speech and Language, vol. 2019, no. 58, ISSN 0885-2308

Publisher

Elsevier Science

DOI

10.1016/j.csl.2019.06.004

UT WoS

000477663800022

EID Scopus

2-s2.0-85067550556

BibTeX

@ARTICLE{FITPUB12039,
   author = "Ond\v{r}ej Novotn\'{y} and Old\v{r}ich Plchot and Ond\v{r}ej Glembek and Jan \v{C}ernock\'{y} and Luk\'{a}\v{s} Burget",
   title = "Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition",
   pages = "403--421",
   journal = "Computer Speech and Language",
   volume = 2019,
   number = 58,
   year = 2019,
   ISSN = "0885-2308",
   doi = "10.1016/j.csl.2019.06.004",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12039"
}