Publication Details

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023. p. 1136-1143. ISBN: 978-1-6654-7189-3.

Czech title

Extrakce informací o mluvčím a emocích ze self-supervised modelů řeči pomocí korelace po kanálech

Type

conference paper

Language

English

Authors

Stafylakis Themos
Mošner Ladislav, Ing. (DCGM)
KAKOUROS, S.
Plchot Oldřich, Ing., Ph.D. (DCGM)
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

Speaker identification, speaker verification, emotion recognition, self-supervised models

Abstract

Self-supervised learning of speech representations from large
amounts of unlabeled data has enabled state-of-the-art results
in several speech processing tasks. Aggregating these speech
representations across time is typically approached by using
descriptive statistics, and in particular, using the first- and
second-order statistics of representation coefficients. In this
paper, we examine an alternative way of extracting speaker
and emotion information from self-supervised trained models,
based on the correlations between the coefficients of the
representations - correlation pooling. We show improvements
over mean pooling and further gains when the pooling
methods are combined via fusion. The code is available at
github.com/Lamomal/s3prl_correlation.

Published

2023

Pages

1136–1143

Proceedings

2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, Doha, QA

ISBN

978-1-6654-7189-3

Publisher

IEEE Signal Processing Society

Place

Doha

DOI

10.1109/SLT54892.2023.10023345

UT WoS

000968851900153

EID Scopus

2-s2.0-85144287715

BibTeX

@inproceedings{BUT185160,
  author="STAFYLAKIS, T. and MOŠNER, L. and KAKOUROS, S. and PLCHOT, O. and BURGET, L. and ČERNOCKÝ, J.",
  title="Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations",
  booktitle="2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",
  year="2023",
  pages="1136--1143",
  publisher="IEEE Signal Processing Society",
  address="Doha",
  doi="10.1109/SLT54892.2023.10023345",
  isbn="978-1-6654-7189-3",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345"
}

Files

pdf stafylakis_SLT2022_published in year 2023.pdf 283 kB