Dissertation Topic
Multi-Talker ASR System Leveraging Large Pre-Trained Models with Discrete Hidden Representations
Academic Year: 2025/2026
Supervisor: Burget Lukáš, doc. Ing., Ph.D.
Department: Department of Computer Graphics and Multimedia
Programs:
Information Technology (DIT) - full-time study
This dissertation topic is available for Czech studies only.
This PhD research will explore the development of next-generation multi-talker automatic speech recognition (ASR) systems by leveraging large pre-trained models and discrete hidden representations. A key challenge in multi-talker ASR is effectively handling overlapping speech, traditionally addressed through pipelines involving speaker diarization, source separation, and single-speaker ASR. This PhD will investigate how recent advances in Neural Audio Codecs and self-supervised learning can contribute to a more integrated and efficient approach. The research will encompass understanding existing methodologies, designing novel architectures, and evaluating their performance on standard datasets, with the overarching goal of improving the robustness and scalability of multi-talker ASR in real-world scenarios.