Publication Details

BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

KOCOUR Martin, CÁMBARA Guillermo, LUQUE Jordi, BONET David, FARRÚS Mireia, KARAFIÁT Martin, VESELÝ Karel and ČERNOCKÝ Jan. BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge. In: Proceedings of IberSPEECH 2021. Vallaloid: International Speech Communication Association, 2021, pp. 113-117. Available from: https://www.isca-speech.org/archive/iberspeech_2021/kocour21_iberspeech.html

Czech title

BCN2BRNO: Fúze ASR systémů pro Albayzin 2020 Speech to Text Challenge

Type

conference paper

Language

english

Authors

Kocour Martin, Ing. (DCGM FIT BUT)
Cámbara Guillermo (UPF)
Luque Jordi (Telefónica)
Bonet David (Telefónica)
Farrús Mireia (UoB)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

Keywords

fusion, end-to-end model, hybrid model, semisupervised, automatic speech recognition, convolutional neural network.

Abstract

This paper describes the joint effort of BUT and Telefónica Research on the development of Automatic Speech Recognition systems for the Albayzin 2020 Challenge. We compare approaches based on either hybrid or end-to-end models. In hybrid modelling, we explore the impact of a SpecAugment layer on performance. For end-to-end modelling, we used a convolutional neural network with gated linear units (GLUs). The performance of such model is also evaluated with an additional n-gram language model to improve word error rates. We further inspect source separation methods to extract speech from noisy environments (i.e. TV shows). More precisely, we assess the effect of using a neural-based music separator named Demucs. A fusion of our best systems achieved 23.33% WER in official Albayzin 2020 evaluations. Aside from techniques used in our final submitted systems, we also describe our efforts in retrieving high-quality transcripts for training.

Published

2021

Pages

113-117

Proceedings

Proceedings of IberSPEECH 2021

Conference

IberSPEECH 2021 Conference, Valladolid, ES

Publisher

International Speech Communication Association

Place

Vallaloid, ES

DOI

10.21437/IberSPEECH.2021-24

BibTeX

@INPROCEEDINGS{FITPUB12577,
   author = "Martin Kocour and Guillermo C\'{a}mbara and Jordi Luque and David Bonet and Mireia Farr\'{u}s and Martin Karafi\'{a}t and Karel Vesel\'{y} and Jan \v{C}ernock\'{y}",
   title = "BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge",
   pages = "113--117",
   booktitle = "Proceedings of IberSPEECH 2021",
   year = 2021,
   location = "Vallaloid, ES",
   publisher = "International Speech Communication Association",
   doi = "10.21437/IberSPEECH.2021-24",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12577"
}