Publication Details

Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems

BENEŠ Karel, KOCOUR Martin and BURGET Lukáš. Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024, pp. 11276-11280. ISBN 979-8-3503-4485-1. Available from: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446739
Czech title
Hystoc: Generování konfidencí slov pro fúzi end-to-end systémů ASR
Type
conference paper
Language
english
Authors
Beneš Karel, Ing. (DCGM FIT BUT)
Kocour Martin, Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

confidences measures, system fusion, end-toend systems, automatic speech recognition

Abstract

End-to-end (e2e) systems have recently gained wide popularity in automatic speech recognition. However, these systems do generally not provide well-calibrated word-level confidences. In this paper, we propose Hystoc, a simple method for obtaining word-level confidences from hypothesis-level scores. Hystoc is an iterative alignment procedure which turns hypotheses from an n-best output of the ASR system into a confusion network. Eventually, word-level confidences are obtained as posterior probabilities in the individual bins of the confusion network. We show that Hystoc provides confidences that correlate well with the accuracy of the ASR hypothesis. Furthermore, we show that utilizing Hystoc in fusion of multiple e2e ASR systems increases the gains from the fusion by up to 1% WER absolute on Spanish RTVE2020 dataset. Finally, we experiment with using Hystoc for direct fusion of n-best outputs from multiple systems, but we only achieve minor gains when fusing very similar systems.

Published
2024
Pages
11276-11280
Proceedings
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference
2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Seoul, KR
ISBN
979-8-3503-4485-1
Publisher
IEEE Signal Processing Society
Place
Seoul, KR
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB13267,
   author = "Karel Bene\v{s} and Martin Kocour and Luk\'{a}\v{s} Burget",
   title = "Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems",
   pages = "11276--11280",
   booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
   year = 2024,
   location = "Seoul, KR",
   publisher = "IEEE Signal Processing Society",
   ISBN = "979-8-3503-4485-1",
   doi = "10.1109/ICASSP48485.2024.10446739",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13267"
}
Back to top