Publication Details

Advancing speaker embedding learning: Wespeaker toolkit for research and production

WANG, S.; CHEN, Z.; HAN, B.; WANG, H.; XIANG, X.; ROHDIN, J.; SILNOVA, A.; QIAN, Y.; LI, H. Advancing speaker embedding learning: Wespeaker toolkit for research and production. Speech Communication, 2024, vol. 162, no. 103104, p. 1-12. ISSN: 0167-6393.

Czech title

Pokroky v trénování embeddingů řečníků: toolkit Wespeaker pro výzkum a produkci

Type

journal article

Language

English

Authors

Wang Shuai
CHEN, Z.
HAN, B.
WANG, H.
XIANG, X.
Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Silnova Anna, M.Sc., Ph.D. (DCGM)
Qian Yanmin
Li Haizhou
and others

URL

Keywords

Wespeaker; Speaker embedding learning; SSL; Open-source

Abstract

Speaker modeling plays a crucial role in various tasks, and fixed-dimensional
vector representations, known as speaker embeddings, are the predominant modeling
approach. These embeddings are typically evaluated within the framework of
speaker verification, yet their utility extends to a broad scope of related tasks
including speaker diarization, speech synthesis, voice conversion, and target
speaker extraction. This paper presents Wespeaker, a user-friendly toolkit
designed for both research and production purposes, dedicated to the learning of
speaker embeddings. Wespeaker offers scalable data management, state-of-the-art
speaker embedding models, and self-supervised learning training schemes with the
potential to leverage large-scale unlabeled real-world data. The toolkit
incorporates structured recipes that have been successfully adopted in winning
systems across various speaker verification challenges, ensuring highly
competitive results. For production-oriented development, Wespeaker integrates
CPU- and GPU-compatible deployment and runtime codes, supporting mainstream
platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI.
Wespeaker also provides off-the-shelf high-quality speaker embeddings by
providing various pretrained models, which can be effortlessly applied to
different tasks that require speaker modeling. The toolkit is publicly available
at https://github.com/wenet-e2e/wespeaker.

Published

2024

Pages

1–12

Journal

Speech Communication, vol. 162, no. 103104, ISSN 0167-6393

DOI

10.1016/j.specom.2024.103104

UT WoS

001279201500001

EID Scopus

2-s2.0-85199203394

BibTeX

@article{BUT193986,
  author="WANG, S. and CHEN, Z. and HAN, B. and WANG, H. and XIANG, X. and ROHDIN, J. and SILNOVA, A. and QIAN, Y. and LI, H.",
  title="Advancing speaker embedding learning: Wespeaker toolkit for research and production",
  journal="Speech Communication",
  year="2024",
  volume="162",
  number="103104",
  pages="1--12",
  doi="10.1016/j.specom.2024.103104",
  issn="0167-6393",
  url="https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%"
}

Files

pdf wang_speech communication_2024.pdf 2 MB