Publication Details

BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge

POLOK Alexander, KLEMENT Dominik, HAN Jiangyu, SEDLÁČEK Šimon, YUSUF Bolaji, MACIEJEWSKI Matthew, WIESNER Matthew and BURGET Lukáš. BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge. In: Proceedings of CHiME 2024 Workshop. Kos Island: International Speech Communication Association, 2024, pp. 18-22. Available from: https://www.isca-archive.org/chime_2024/polok24_chime.pdf
Czech title
Popis VUT/JHU systému pro evaluaci CHiME-8 NOTSOFAR-1
Type
conference paper
Language
english
Authors
Polok Alexander, Ing. (DCGM FIT BUT)
Klement Dominik, Bc. (FIT BUT)
Han Jiangyu, M.Eng. (DCGM FIT BUT)
Sedláček Šimon, Ing. (DCGM FIT BUT)
Yusuf Bolaji (DCGM FIT BUT)
Maciejewski Matthew (JHU)
Wiesner Matthew (JHU)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

multi-talker speech recognition, CHiME-8, NOTSOFAR-1, target-speaker

Abstract

This paper presents our method for tackling the CHIME-8 chal- lenge's NOTSOFAR-1 task, which requires participants to per- form multi-speaker automatic speech recognition (ASR) using audio from distant microphone arrays. We modify the Pyan- note3 diarization pipeline, incorporating pre-trained WavLM as local EEND to adapt effectively to new domains, and we intro- duce two diarization-aware approaches to ASR by condition- ing Whisper on diarization outputs for target-speaker ASR. The first method, which we refer to as Query-Key Biasing, modi- fies Whisper's attention mechanism and positional embeddings with a learnable attention mask to exclude non-target speaker segments in the audio. The second method, called Frame- Level Diarization-Dependent Transformations, applies affine, diarization-dependent transformations with trainable parame- ters to the inputs of one or more transformer blocks. We also extend both the ASR and diarization systems to a multichannel setup by incorporating cross-channel communication into our models. Finally, we report the performance of these approaches on the NOTSOFAR-1 dataset.

Published
2024
Pages
18-22
Proceedings
Proceedings of CHiME 2024 Workshop
Conference
8th International Workshop on Speech Processing in Everyday Environments (CHiME 2024), Kos Island - a satelite event of the Interspeech 2024 conference, GR
Publisher
International Speech Communication Association
Place
Kos Island, GR
DOI
BibTeX
@INPROCEEDINGS{FITPUB13338,
   author = "Alexander Polok and Dominik Klement and Jiangyu Han and \v{S}imon Sedl\'{a}\v{c}ek and Bolaji Yusuf and Matthew Maciejewski and Matthew Wiesner and Luk\'{a}\v{s} Burget",
   title = "BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge",
   pages = "18--22",
   booktitle = "Proceedings of CHiME 2024 Workshop",
   year = 2024,
   location = "Kos Island, GR",
   publisher = "International Speech Communication Association",
   doi = "10.21437/CHiME.2024-4",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13338"
}
Back to top