Publication Details

Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion

ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, KINOSHITA Keisuke, HIGUCHI Takuya, NAKATANI Tomohiro and ČERNOCKÝ Jan. Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion. In: Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018, pp. 6702-6706. ISBN 978-1-5386-4658-8.
Czech title
Optimalizace multikanálové extrakce řeči s informací o mluvčím pomocí ASR kritéria
Type
conference paper
Language
english
Authors
Žmolíková Kateřina, Ing., Ph.D. (DCGM FIT BUT)
Delcroix Marc (NTT)
Kinoshita Keisuke (NTT)
Higuchi Takuya (NTT)
Nakatani Tomohiro (NTT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

Speaker extraction, joint training, speaker adaptive neural network, beamforming, speech recognition

Abstract

This paper addresses the problem of recognizing speech corrupted by overlapping speakers in a multichannel setting. To extract a target speaker from the mixture, we use a neural network based beamformer which uses masks estimated by a neural network to compute statistically optimal spatial filters. Following our previous work, we inform the neural network about the target speaker using information extracted from an adaptation utterance, enabling the network to track the target speaker. While in the previous work, this method was used to separately extract the speaker and then pass such preprocessed speech to a speech recognition system, here we explore training both systems jointly with a common speech recognition criterion. We show that integrating the two systems and training for the final objective improves the performance. In addition, the integration enables further sharing of information between the acoustic model and the speaker extraction system, by making use of the predicted HMMstate posteriors to refine the masks used for beamforming.

Published
2018
Pages
6702-6706
Proceedings
Proceedings of ICASSP 2018
Conference
IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, CA
ISBN
978-1-5386-4658-8
Publisher
IEEE Signal Processing Society
Place
Calgary, CA
DOI
UT WoS
000446384606172
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB11722,
   author = "Kate\v{r}ina \v{Z}mol\'{i}kov\'{a} and Marc Delcroix and Keisuke Kinoshita and Takuya Higuchi and Tomohiro Nakatani and Jan \v{C}ernock\'{y}",
   title = "Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion",
   pages = "6702--6706",
   booktitle = "Proceedings of ICASSP 2018",
   year = 2018,
   location = "Calgary, CA",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-5386-4658-8",
   doi = "10.1109/ICASSP.2018.8461533",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11722"
}
Back to top