Publication Details
Advancing speaker embedding learning: Wespeaker toolkit for research and production
Chen Zhengyang (SJTU)
Han Bing (SJTU)
Wang Hongji (Tencent)
Xiang Xu (SJTU)
Rohdin Johan A., Dr. (DCGM FIT BUT)
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Qian Yanmin (SJTU)
and others
- https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%
- https://www.fit.vut.cz/research/group/speech/public/publi/2024/wang_speech%20communication_2024.pdf PDF
Wespeaker; Speaker embedding learning; SSL; Open-source
Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.
@ARTICLE{FITPUB13337, author = "Shuai Wang and Zhengyang Chen and Bing Han and Hongji Wang and Xu Xiang and A. Johan Rohdin and Anna Silnova and Yanmin Qian and Haizhou Li and et al.", title = "Advancing speaker embedding learning: Wespeaker toolkit for research and production", pages = "1--12", journal = "Speech Communication", volume = 162, number = 103104, year = 2024, ISSN = "0167-6393", doi = "10.1016/j.specom.2024.103104", language = "english", url = "https://www.fit.vut.cz/research/publication/13337" }