Publication Details

Effective Phase Encoding for End-To-End Speaker Verification

PENG Junyi, QU Xiaoyang, GU Rongzhi, WANG Jianzong, XIAO Jing, BURGET Lukáš and ČERNOCKÝ Jan. Effective Phase Encoding for End-To-End Speaker Verification. In: Proceedings Interspeech 2021. Brno: International Speech Communication Association, 2021, pp. 2366-2370. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/interspeech_2021/peng21c_interspeech.html

Czech title

Efektivní modelování fáze v end-to-end rozpoznávání mluvčího

Type

conference paper

Language

english

Authors

Peng Junyi, Msc. Eng. (DCGM FIT BUT)
Qu Xiaoyang (PATS)
Gu Rongzhi (PKUSZ)
Wang Jianzong (PATS)
Xiao Jing (PATS)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

end-to-end speaker verification, phase information, group delay, on-the-fly

Abstract

The widely used magnitude spectrum based features have shown their superiority in the field of speech processing. In contrast, the importance of phase spectrum is always ignored. This is because the patterns hidden in phase cannot be intuitively modelled and interpreted, due to phase wrapping phenomenon. In this paper, we explore novel phase spectrum based features, named Learnable Group Delay (LearnGD), to capture useful information in speech signals. Specifically, firstly, the negative of the spectral derivative of the phase spectrum, called group delay (GD), is used to unwrap the phase. Then, to suppress the spiky nature of GD, which is caused by its roots close to the unit circle in the Z domain, a carefully designed light convolutional smoothing layer is employed to reconstruct the GD. Finally, an exponential hyper-parameter is introduced to reconstruct GD features to restore the spectrum range and generate LearnGD features. For performance evaluation, speaker verification experiments are conducted on the VoxCeleb2 corpus. Compared to the traditional acoustic feature derived from the magnitude spectrum, the proposed phase-based features reach a 27.8% relative improvement in terms of EER. Furthermore, experimental results on TIMIT phoneme recognition task also demonstrate the effectiveness of our proposed phase-based features.

Published

2021

Pages

2366-2370

Journal

Proceedings of Interspeech - on-line, vol. 2021, no. 8, ISSN 1990-9772

Proceedings

Proceedings Interspeech 2021

Conference

Interspeech Conference, Brno, CZ

Publisher

International Speech Communication Association

Place

Brno, CZ

DOI

10.21437/Interspeech.2021-2025

UT WoS

000841879502096

EID Scopus

2-s2.0-85119247671

BibTeX

@INPROCEEDINGS{FITPUB12607,
   author = "Junyi Peng and Xiaoyang Qu and Rongzhi Gu and Jianzong Wang and Jing Xiao and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Effective Phase Encoding for End-To-End Speaker Verification",
   pages = "2366--2370",
   booktitle = "Proceedings Interspeech 2021",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2021,
   number = 8,
   year = 2021,
   location = "Brno, CZ",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2021-2025",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12607"
}