Publication Details

Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition

KOCOUR Martin, VESELÝ Karel, BLATT Alexander, ZULUAGA-GOMEZ Juan, SZŐKE Igor, ČERNOCKÝ Jan, KLAKOW Dietrich and MOTLÍČEK Petr. Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition. In: Proceedings Interspeech 2021. Brno: International Speech Communication Association, 2021, pp. 3301-3305. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/interspeech_2021/kocour21_interspeech.html
Czech title
Zvýrazňování kontextové informace v přepisu řeči pro rozpoznávání volacích znaků v letecké dopravě
Type
conference paper
Language
english
Authors
Kocour Martin, Ing. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Blatt Alexander (UDS)
Zuluaga-Gomez Juan (IDIAP)
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
Klakow Dietrich (UDS)
Motlíček Petr, Ing., Ph.D. (IDIAP)
URL
Keywords

Air Traffic Control, Automatic Speech Recognition, Contextual Adaptation, Call-sign Recognition, Call-sign Detection, OpenSky Network

Abstract

Contextual adaptation of ASR can be very beneficial for multiaccent and often noisy Air-Traffic Control (ATC) speech. Our focus is call-sign recognition, which can be used to track conversations of ATC operators with individual airplanes. We developed a two-stage boosting strategy, consisting of HCLG boosting and Lattice boosting. Both are implemented as WFST compositions and the contextual information is specific to each utterance. In HCLG boosting we give score discounts to individual words, while in Lattice boosting the score discounts are given to word sequences. The context data have origin in surveillance database of OpenSky Network. From this, we obtain lists of call-signs that are made more likely to appear in the best hypothesis of ASR. This also improves the accuracy of the NLU module that recognizes the call-signs from the best hypothesis of ASR. As part of ATCO2 project, we collected liveatc test set2. The boosting of call-signs leads to 4.7% absolute WER improvement and 27.1% absolute increase of Call-Sign recognition Accuracy (CSA). Our best result of 82.9% CSA is quite good, given that the data is noisy, and WER 28.4% is relatively high. We believe there is still room for improvement.

Published
2021
Pages
3301-3305
Journal
Proceedings of Interspeech - on-line, vol. 2021, no. 8, ISSN 1990-9772
Proceedings
Proceedings Interspeech 2021
Conference
Interspeech Conference, Brno, CZ
Publisher
International Speech Communication Association
Place
Brno, CZ
DOI
UT WoS
000841879503079
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB12610,
   author = "Martin Kocour and Karel Vesel\'{y} and Alexander Blatt and Juan Zuluaga-Gomez and Igor Sz\H{o}ke and Jan \v{C}ernock\'{y} and Dietrich Klakow and Petr Motl\'{i}\v{c}ek",
   title = "Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition",
   pages = "3301--3305",
   booktitle = "Proceedings Interspeech 2021",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2021,
   number = 8,
   year = 2021,
   location = "Brno, CZ",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2021-1619",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12610"
}
Back to top