Publication Details

Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition

BHATTACHARJEE Mrinmoy, NIGMATULINA Iuliia, PRASAD Amrutha, RANGAPPA Pradeep, MADIKERI Srikanth, MOTLÍČEK Petr, HELMKE Hartmut and KLEINERT Matthias. Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024, pp. 12652-12656. ISBN 979-8-3503-4485-1. Available from: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447465

Czech title

Metody kontextového ovlivnění pro zlepšení detekce neobvyklých slov v automatickém rozpoznávání řeči

Type

conference paper

Language

english

Authors

Bhattacharjee Mrinmoy (IDIAP)
Nigmatulina Iuliia (IDIAP)
Prasad Amrutha (DCGM FIT BUT)
Rangappa Pradeep (IDIAP)
Madikeri Srikanth (IDIAP)
Motlíček Petr, doc. Ing., Ph.D. (DCGM FIT BUT)
Helmke Hartmut (DLR)
Kleinert Matthias (DLR)

URL

Keywords

Automatic speech recognition, air traffic control, domain adaptation, contextual biasing, rare word recognition

Abstract

In specialized domains like Air Traffic Control (ATC), a notable challenge in porting a deployed Automatic Speech Recognition (ASR) system from one airport to another is the alteration in the set of crucial words that must be ac- curately detected in the new environment. Typically, such words have limited occurrences in training data, making it impractical to retrain the ASR system. This paper explores innovative word-boosting techniques to improve the detec- tion rate of such rare words in the ASR hypotheses for the ATC domain. Two acoustic models are investigated: a hybrid CNN-TDNNF model trained from scratch and a pre-trained wav2vec2-based XLSR model fine-tuned on a common ATC dataset. The word boosting is done in three ways. First, an out-of-vocabulary word addition method is explored. Second, G-boosting is explored, which amends the language model before building the decoding graph. Third, the boosting is performed on the fly during decoding using lattice re-scoring. The results indicate that the G-boosting method performs best and provides an approximately 30-43% relative improvement in recall of the boosted words. Moreover, a relative improve- ment of up to 48% is obtained upon combining G-boosting and lattice-rescoring

Published

2024

Pages

12652-12656

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Seoul, KR

ISBN

979-8-3503-4485-1

Publisher

IEEE Signal Processing Society

Place

Seoul, KR

DOI

10.1109/ICASSP48485.2024.10447465

EID Scopus

2-s2.0-85195379769

BibTeX

@INPROCEEDINGS{FITPUB13281,
   author = "Mrinmoy Bhattacharjee and Iuliia Nigmatulina and Amrutha Prasad and Pradeep Rangappa and Srikanth Madikeri and Petr Motl\'{i}\v{c}ek and Hartmut Helmke and Matthias Kleinert",
   title = "Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition",
   pages = "12652--12656",
   booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
   year = 2024,
   location = "Seoul, KR",
   publisher = "IEEE Signal Processing Society",
   ISBN = "979-8-3503-4485-1",
   doi = "10.1109/ICASSP48485.2024.10447465",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13281"
}