Publication Details

Speech Technology for Unwritten Languages

SCHARENBORG Odette, BESACIER Laurent, BLACK Alan, HASEGAWA-JOHNSON Mark, METZE Florian, NEUBIG Graham, STÜKER Sebastian, GODARD Pierre, MÜLLER Markus, ONDEL Yang Lucas Antoine Francois, PALASKAR Shruti, ARTHUR Philip, CIANNELLA Francesco, DU Mingxing, LARSEN Elin, MERKX Danny, RIAD Rachid, WANG Liming and DUPOUX Emmanuel. Speech Technology for Unwritten Languages. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 2020, no. 28, pp. 964-975. ISSN 2329-9290. Available from: https://ieeexplore.ieee.org/document/8998182

Czech title

Řečové technologie pro jazyky bez psané formy

Type

journal article

Language

english

Authors

Scharenborg Odette (RUN)
Besacier Laurent (UGA)
Black Alan (CMU)
Hasegawa-Johnson Mark (UILLINOIS)
Metze Florian (CMU)
Neubig Graham (CMU)
Stüker Sebastian (KIT)
Godard Pierre (LIMSI)
Müller Markus (KIT)
Ondel Yang Lucas Antoine Francois, Mgr., Ph.D. (DCGM FIT BUT)
Palaskar Shruti (CMU)
Arthur Philip (CMU)
Ciannella Francesco (CMU)
Du Mingxing (INRIA)
Larsen Elin (INRIA)
Merkx Danny (RUN)
Riad Rachid (INRIA)
Wang Liming (UILLINOIS)
Dupoux Emmanuel (ENS)

URL

Keywords

Speech processing, automatic speech recognition, unsupervised learning, speech synthesis, image retrieval.

Abstract

Abstract-Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.

Published

2020

Pages

964-975

Journal

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 2020, no. 28, ISSN 2329-9290

Publisher

IEEE Signal Processing Society

DOI

10.1109/TASLP.2020.2973896

UT WoS

000522357500002

EID Scopus

2-s2.0-85079642575

BibTeX

@ARTICLE{FITPUB12469,
   author = "Odette Scharenborg and Laurent Besacier and Alan Black and Mark Hasegawa-Johnson and Florian Metze and Graham Neubig and Sebastian St{\"{u}}ker and Pierre Godard and Markus M{\"{u}}ller and Francois Antoine Lucas Yang Ondel and Shruti Palaskar and Philip Arthur and Francesco Ciannella and Mingxing Du and Elin Larsen and Danny Merkx and Rachid Riad and Liming Wang and Emmanuel Dupoux",
   title = "Speech Technology for Unwritten Languages",
   pages = "964--975",
   journal = "IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING",
   volume = 2020,
   number = 28,
   year = 2020,
   ISSN = "2329-9290",
   doi = "10.1109/TASLP.2020.2973896",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12469"
}