Project Details
Multi-lingualita v řečových technologiích
Project Period: 1. 1. 2020 - 31. 8. 2023
Project Type: grant
Code: LTAIN19087
Agency: Ministry of Education, Youth and Sports Czech Republic
Program: INTER-EXCELLENCE - Podprogram INTER-ACTION
multi-linguality, speech recognition, machine learning, data, transfer learning
Speech data mining technologies and human-machine interfaces based on speech have witnessed significant advances in the past decade and numerous applications have been successfully commercialized. However, they usually work correctly only in favorable scenarios - in languages with abundance of training data and in relatively clean environments, such as office or apartment. In fast developing big markets such as the Indian one, severe problems make the exploitation of speech difficult: multitude of languages (some of them with limited or missing resources), highly noisy conditions (lots of business is simply done on the streets in Indian cities), and highly variable numbers of speakers in a conversation (from normal two to whole families). These make the development of automatic speech recognition (ASR), speaker recognition (SR) and speaker diarization (determining who spoke when, SD) complicated. In the proposed project, two established research institutes with significant track multi-lingual ASR, robust SR and SD: Brno University of Technology (BUT), IIT Madras (IIT-M) have teamed up with an important player on the Indian and global personal electronics markets - Samsung R&D Institute India-Bangalore (SRI-B), and propose significant advances in several speech technologies, notably in multi-lingual low-resource ASR. While BUT and IIT-M will provide top speech research (based, among others, on the U.S. IARPA Babel and Material programs, victory in IARPA ASpIRE evaluation and in Interspeech 2018 Low Resource Speech Recognition Challenge for Indian Languages, and on Indian MANDI project), SRI-B will provide data, industrial guidelines and to produce demonstrators of technologies.
Žižka Josef, Ing. (UPGM FIT VUT) , team leader
Egorova Ekaterina, Ing., Ph.D. (UPGM FIT VUT)
Kocour Martin, Ing. (UPGM FIT VUT)
Peng Junyi, Msc. Eng. (UPGM FIT VUT)
Plchot Oldřich, Ing., Ph.D. (UPGM FIT VUT)
Skácel Miroslav, Ing. (UPGM FIT VUT)
Yusuf Bolaji (UPGM FIT VUT)
Žmolíková Kateřina, Ing., Ph.D. (UPGM FIT VUT)
2023
- SILNOVA Anna, SLAVÍČEK Josef, MOŠNER Ladislav, KLČO Michal, PLCHOT Oldřich, MATĚJKA Pavel, PENG Junyi, STAFYLAKIS Themos and BURGET Lukáš. ABC System Description for NIST LRE 2022. In: Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023, pp. 1-5. Detail
- PENG Junyi, PLCHOT Oldřich, STAFYLAKIS Themos, MOŠNER Ladislav, BURGET Lukáš and ČERNOCKÝ Jan. An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification. In: 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023, pp. 555-562. ISBN 978-1-6654-7189-3. Detail
- KESIRAJU Santosh, BENEŠ Karel, TIKHONOV Maksim and ČERNOCKÝ Jan. BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. In: 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference. Toronto (in-person and online): Association for Computational Linguistics, 2023, pp. 227-234. ISBN 978-1-959429-84-5. Detail
- MATĚJKA Pavel, SILNOVA Anna, SLAVÍČEK Josef, MOŠNER Ladislav, PLCHOT Oldřich, KLČO Michal, PENG Junyi, STAFYLAKIS Themos and BURGET Lukáš. Description and Analysis of ABC Submission to NIST LRE 2022. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 511-515. ISSN 1990-9772. Detail
- STAFYLAKIS Themos, MOŠNER Ladislav, KAKOUROS Sofoklis, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In: 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023, pp. 1136-1143. ISBN 978-1-6654-7189-3. Detail
- PENG Junyi, PLCHOT Oldřich, STAFYLAKIS Themos, MOŠNER Ladislav, BURGET Lukáš and ČERNOCKÝ Jan. Improving Speaker Verification with Self-Pretrained Transformer Models. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 5361-5365. ISSN 1990-9772. Detail
- MOŠNER Ladislav, PLCHOT Oldřich, PENG Junyi, BURGET Lukáš and ČERNOCKÝ Jan. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 1693-1697. ISSN 1990-9772. Detail
- ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, OCHIAI Tsubasa, ČERNOCKÝ Jan, KINOSHITA Keisuke and YU Dong. Neural Target Speech Extraction: An overview. IEEE Signal Processing Magazine, vol. 40, no. 3, 2023, pp. 8-29. ISSN 1558-0792. Detail
- PENG Junyi, STAFYLAKIS Themos, GU Rongzhi, PLCHOT Oldřich, MOŠNER Ladislav, BURGET Lukáš and ČERNOCKÝ Jan. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023, pp. 1-5. ISBN 978-1-7281-6327-7. Detail
2022
- ŠVEC Ján, ŽMOLÍKOVÁ Kateřina, KOCOUR Martin, DELCROIX Marc, OCHIAI Tsubasa, MOŠNER Ladislav and ČERNOCKÝ Jan. Analysis of impact of emotions on target speech extraction and speech separation. In: Proceedings of The 17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022). Bamberg: IEEE Signal Processing Society, 2022, pp. 1-5. ISBN 978-1-6654-6867-1. Detail
- SILNOVA Anna, STAFYLAKIS Themos, MOŠNER Ladislav, PLCHOT Oldřich, ROHDIN Johan A., MATĚJKA Pavel, BURGET Lukáš, GLEMBEK Ondřej and BRUMMER Johan Nikolaas Langenhoven. Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. In: Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022, pp. 9-16. Detail
- KOCOUR Martin, UMESH Jahnavi, KARAFIÁT Martin, ŠVEC Ján, LOPEZ Fernando, BENEŠ Karel, DIEZ Sánchez Mireia, SZŐKE Igor, LUQUE Jordi, VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. In: Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022, pp. 276-280. Detail
- ALAM Jahangir, BURGET Lukáš, GLEMBEK Ondřej, MATĚJKA Pavel, MOŠNER Ladislav, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna and STAFYLAKIS Themos et al. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. In: Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022, pp. 346-353. Detail
- HAN Jiangyu, LONG Yanhua, BURGET Lukáš and ČERNOCKÝ Jan. DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022, pp. 7292-7296. ISBN 978-1-6654-0540-9. Detail
- PENG Junyi, GU Rongzhi, MOŠNER Ladislav, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Learnable Sparse Filterbank for Speaker Verification. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 5110-5114. ISSN 1990-9772. Detail
- MOŠNER Ladislav, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022, pp. 7982-7986. ISBN 978-1-6654-0540-9. Detail
- MOŠNER Ladislav, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Multisv: Dataset for Far-Field Multi-Channel Speaker Verification. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022, pp. 7977-7981. ISBN 978-1-6654-0540-9. Detail
- PENG Junyi, ZHANG Chunlei, ČERNOCKÝ Jan and YU Dong. Progressive contrastive learning for self-supervised text-independent speaker verification. In: Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022, pp. 17-24. Detail
- KOCOUR Martin, ŽMOLÍKOVÁ Kateřina, ONDEL Yang Lucas Antoine Francois, ŠVEC Ján, DELCROIX Marc, OCHIAI Tsubasa, BURGET Lukáš and ČERNOCKÝ Jan. Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 4955-4959. ISSN 1990-9772. Detail
- DE Benito Gorron Diego, ŽMOLÍKOVÁ Kateřina and TORRE Toledano Doroteo. Source Separation for Sound Event Detection in domestic environments using jointly trained models. In: Proceedings of The 17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022). Bamberg: IEEE Signal Processing Society, 2022, pp. 1-5. ISBN 978-1-6654-6867-1. Detail
- BASKAR Murali K., HERZIG Tim, NGUYEN Diana, DIEZ Sánchez Mireia, POLZEHL Tim, BURGET Lukáš and ČERNOCKÝ Jan. Speaker adaptation for Wav2vec2 based dysarthric ASR. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 3403-3407. ISSN 1990-9772. Detail
- EGOROVA Ekaterina, VYDANA Hari K., BURGET Lukáš and ČERNOCKÝ Jan. Spelling-Aware Word-Based End-to-End ASR. IEEE Signal Processing Letters, vol. 29, no. 29, 2022, pp. 1729-1733. ISSN 1558-2361. Detail
2021
- YUSUF Bolaji, ONDEL Yang Lucas Antoine Francois, BURGET Lukáš, ČERNOCKÝ Jan and SARAÇLAR Murat. A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 3710-3714. ISBN 978-1-7281-7605-5. Detail
- LANDINI Federico Nicolás, GLEMBEK Ondřej, MATĚJKA Pavel, ROHDIN Johan A., BURGET Lukáš, DIEZ Sánchez Mireia and SILNOVA Anna. Analysis of the BUT Diarization System for Voxconverse Challenge. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 5819-5823. ISBN 978-1-7281-7605-5. Detail
- ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, RAJ Desh, WATANABE Shinji and ČERNOCKÝ Jan. Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics. In: Proceedings of 2021 Interspeech. Brno: International Speech Communication Association, 2021, pp. 1464-1468. ISSN 1990-9772. Detail
- KOCOUR Martin, CÁMBARA Guillermo, LUQUE Jordi, BONET David, FARRÚS Mireia, KARAFIÁT Martin, VESELÝ Karel and ČERNOCKÝ Jan. BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge. In: Proceedings of IberSPEECH 2021. Vallaloid: International Speech Communication Association, 2021, pp. 113-117. Detail
- BASKAR Murali K., BURGET Lukáš, WATANABE Shinji, ASTUDILLO Ramon and ČERNOCKÝ Jan. Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 6753-6757. ISBN 978-1-7281-7605-5. Detail
- PENG Junyi, QU Xiaoyang, GU Rongzhi, WANG Jianzong, XIAO Jing, BURGET Lukáš and ČERNOCKÝ Jan. Effective Phase Encoding for End-To-End Speaker Verification. In: Proceedings Interspeech 2021. Brno: International Speech Communication Association, 2021, pp. 2366-2370. ISSN 1990-9772. Detail
- PENG Junyi, QU Xiaoyang, WANG Jianzong, GU Rongzhi, XIAO Jing, BURGET Lukáš and ČERNOCKÝ Jan. ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Brno: International Speech Communication Association, 2021, pp. 511-515. ISSN 1990-9772. Detail
- ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, BURGET Lukáš, NAKATANI Tomohiro and ČERNOCKÝ Jan. Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation. In: 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings. Shenzhen - virtual : IEEE Signal Processing Society, 2021, pp. 889-896. ISBN 978-1-7281-7066-4. Detail
2020
- LOZANO Díez Alicia, SILNOVA Anna, PULUGUNDLA Bhargav, ROHDIN Johan A., VESELÝ Karel, BURGET Lukáš, PLCHOT Oldřich, GLEMBEK Ondřej, NOVOTNÝ Ondřej and MATĚJKA Pavel. BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Shanghai: International Speech Communication Association, 2020, pp. 761-765. ISSN 1990-9772. Detail
2022
- BCN2BRNO Automatic speech recognition system for Albayzin 2022 Speech to Text Challenge, software, 2022
Authors: Kocour Martin, Umesh Jahnavi, Karafiát Martin, Švec Ján, Lopez Fernando, Beneš Karel, Diez Sánchez Mireia, Szőke Igor, Luque Jordi, Veselý Karel, Burget Lukáš, Černocký Jan Detail