Publication Details

Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models

KESIRAJU Santosh, SARVAŠ Marek, PAVLÍČEK Tomáš, MACAIRE Cécile and CIUBA Alejandro. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 2148-2152. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf
Czech title
Strategie pro zlepšení překladu řeči na text s omezenými zdroji zdroji založená na předtrénovaných modelech ASR
Type
conference paper
Language
english
Authors
Kesiraju Santosh (DCGM FIT BUT)
Sarvaš Marek, Bc. (DCGM FIT BUT)
Pavlíček Tomáš, Ing. (Phonexia)
Macaire Cécile (UGA)
Ciuba Alejandro ()
URL
Keywords

speech translation, low-resource, multilingual, speech recognition

Abstract

This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and reallow resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyperparameters) that contribute the most for improvements in lowresource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.

Published
2023
Pages
2148-2152
Journal
Proceedings of Interspeech - on-line, vol. 2023, no. 8, ISSN 1990-9772
Proceedings
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
Interspeech Conference, Dublin, IE
Publisher
International Speech Communication Association
Place
Dublin, IE
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB13109,
   author = "Santosh Kesiraju and Marek Sarva\v{s} and Tom\'{a}\v{s} Pavl\'{i}\v{c}ek and C\'{e}cile Macaire and Alejandro Ciuba",
   title = "Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models",
   pages = "2148--2152",
   booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2023,
   number = 08,
   year = 2023,
   location = "Dublin, IE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2023-2506",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13109"
}
Back to top