Publication Details

SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics

DELCROIX Marc, ŽMOLÍKOVÁ Kateřina, KINOSHITA Keisuke, ARAKI Shoko, OGAWA Atsunori and NAKATANI Tomohiro. SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics. NTT Technical Review, vol. 16, no. 11, 2018, pp. 19-24. ISSN 1348-3447. Available from: https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201811all.pdf&mode=show_pdf
Czech title
SpeakerBeam: Nová technologie hlubokého učení pro extrakci řeči cílového mluvčího na základě jeho hlasových charakteristik
Type
journal article
Language
english
Authors
Delcroix Marc (NTT)
Žmolíková Kateřina, Ing., Ph.D. (DCGM FIT BUT)
Kinoshita Keisuke (NTT)
Araki Shoko (NTT)
Ogawa Atsunori (NTT)
Nakatani Tomohiro (NTT)
URL
Keywords

deep learning, target speaker extraction, SpeakerBeam

Abstract

In a noisy environment such as a cocktail party, humans can focus on listening to a desired speaker, an ability known as selective hearing. Current approaches developed to realize computational selective hearing require knowing the position of the target speaker, which limits their practical usage. This article introduces SpeakerBeam, a deep learning based approach for computational selective hearing based on the characteristics of the target speakers voice. SpeakerBeam requires only a small amount of speech data from the target speaker to compute his/her voice characteristics. It can then extract the speech of that speaker regardless of his/her position or the number of speakers talking in the background.

Published
2018
Pages
19-24
Journal
NTT Technical Review, vol. 16, no. 11, ISSN 1348-3447
Publisher
NTT Corporation
EID Scopus
BibTeX
@ARTICLE{FITPUB12961,
   author = "Marc Delcroix and Kate\v{r}ina \v{Z}mol\'{i}kov\'{a} and Keisuke Kinoshita and Shoko Araki and Atsunori Ogawa and Tomohiro Nakatani",
   title = "SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics",
   pages = "19--24",
   journal = "NTT Technical Review",
   volume = 16,
   number = 11,
   year = 2018,
   ISSN = "1348-3447",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12961"
}
Back to top