Publication Details

Audio Enhancing With DNN Autoencoder For Speaker Recognition

PLCHOT Oldřich, BURGET Lukáš, ARONOWITZ Hagai and MATĚJKA Pavel. Audio Enhancing With DNN Autoencoder For Speaker Recognition. In: Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016. Shanghai: IEEE Signal Processing Society, 2016, pp. 5090-5094. ISBN 978-1-4799-9988-0.

Czech title

Obohacování audia pomocí DNN autoenkodéru pro rozpoznávání mluvčího

Type

conference paper

Language

english

Authors

Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Aronowitz Hagai (IBM)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2016/plchot_icassp2016_0005090.pdf PDF

Keywords

speaker recognition, denoising, de-reverberation, neural networks, DNN

Abstract

In this paper we present a design of a DNN-based autoencoder for speech enhancement and its use for speaker recognition systems for distant microphones and noisy data. We started with augmenting the Fisher database with artificially noised and reverberated data and trained the autoencoder to map noisy and reverberated speech to its clean version. We use the autoencoder as a preprocessing step in the later stage of modelling in state-of-the-art text-dependent and text-independent speaker recognition systems. We report relative improvements up to 50% for the text-dependent system and up to 48% for the text-independent one. With text-independent system, we present a more detailed analysis on various conditions of NIST SRE 2010 and PRISM suggesting that the proposed preprocessig is a promising and efficient way to build a robust speaker recognition system for distant microphone and noisy data.

Annotation

We have presented our approach towards building a robust speaker recognition system. We concentrated on improving the performance on noisy and reverberant data by means of a DNN autoencoder, which is trained to remove both additive noise and reverberation from audio. We showed that our method significantly improves the performance of both state-of-the-art text-dependent and textindependent speaker recognition systems in the domain of distant microphone recordings. We analyzed and discussed the effect of the proposed method both on real-world data as well as on artificially created data. The artificially created data allowed us to measure the effect of enhancing separately for distortions caused by additive noise or reverberation.

Published

2016

Pages

5090-5094

Proceedings

Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016

Conference

41th IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, CN

ISBN

978-1-4799-9988-0

Publisher

IEEE Signal Processing Society

Place

Shanghai, CN

DOI

10.1109/ICASSP.2016.7472647

UT WoS

000388373405048

EID Scopus

2-s2.0-84973277824

BibTeX

@INPROCEEDINGS{FITPUB11139,
   author = "Old\v{r}ich Plchot and Luk\'{a}\v{s} Burget and Hagai Aronowitz and Pavel Mat\v{e}jka",
   title = "Audio Enhancing With DNN Autoencoder For Speaker Recognition",
   pages = "5090--5094",
   booktitle = "Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016",
   year = 2016,
   location = "Shanghai, CN",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4799-9988-0",
   doi = "10.1109/ICASSP.2016.7472647",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11139"
}