Publication Details

Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport

ZHANG, R.; WEI, J.; LU, X.; LU, W.; JIN, D.; ZHANG, L.; XU, J. Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2024, vol. 32, no. 1, p. 3603-3617. ISSN: 2329-9290.

Type

journal article

Language

English

Authors

Zhang Ruiteng
Wei Jianguo
Lu Xugang
Lu Wenhuan
Jin Di
Zhang Lin, Ph.D.
Xu Junhai

URL

https://xplorestaging.ieee.org/document/10596689?denied

Keywords

Speaker recognition; unsupervised domain adaptation; optimal transport; coupling regularization; Speaker recognition; unsupervised domain adaptation; optimal transport; coupling regularization

Abstract

Cross-domain speaker recognition (SR) can be improved by unsupervised domain adaptation (UDA) algorithms. UDA algorithms often reduce domain mismatch at the cost of decreasing the discrimination of speaker features. In contrast, optimal transport (OT) has the potential to achieve domain alignment while preserving the speaker discrimination capability in UDA applications; however, naively applying OT to measure global probability distribution discrepancies between the source and target domains may induce negative transports where samples belonging to different speakers are coupled in transportation. These negative transports reduce the SR model's discriminative power, degrading the SR performance. This paper proposes a coupling-regularized optimal transport (CROT) algorithm for cross-domain SR to reduce the negative transport during UDA. In the proposed CROT, two consecutive processing modules regularize the coupling paths for the OT solution: a progressive inter-speaker constraint (PISC) module and a coupling-smoothed regularization (CSR) module. The PISC, designed as a pseudo-label memory bank with curriculum learning, is first applied to select valid samples to guarantee that coupling samples are from the same speaker. The CSR, designed to control the information entropy of the coupling paths further, reduces the effect of negative transport in UDA. To evaluate the effectiveness of the proposed algorithm, cross-domain SR experiments were conducted under different target domains, speaker encoders, corpora, and acoustic features. Experimental results showed that CROT achieved a 50% relative reduction in equal error rates compared to conventional OT-based UDAs, outperforming the state-of-the-art UDAs.

Published

2024

Pages

3603–3617

Journal

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 32, no. 1, ISSN 2329-9290

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Place

PISCATAWAY

DOI

10.1109/TASLP.2024.3426934

UT WoS

001283673700022

EID Scopus

2-s2.0-85198359234

BibTeX

@article{BUT197616,
  author="Ruiteng {Zhang} and Jianguo {Wei} and Xugang {Lu} and Wenhuan {Lu} and Di {Jin} and Lin {Zhang} and Junhai {Xu}",
  title="Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport",
  journal="IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING",
  year="2024",
  volume="32",
  number="1",
  pages="3603--3617",
  doi="10.1109/TASLP.2024.3426934",
  issn="2329-9290",
  url="https://xplorestaging.ieee.org/document/10596689?denied"
}