Publication Details

Unsupervised Word Segmentation from Speech with Attention

GODARD, P.; BOITO, M.; ONDEL YANG, L.; BERARD, A.; YVON, F.; VILLAVICENCIO, A.; BESACIER, L. Unsupervised Word Segmentation from Speech with Attention. In Proceeding of Interspeech 2018. Proceedings of Interspeech. Hyderabad: International Speech Communication Association, 2018. p. 2678-2682. ISSN: 1990-9772.

Czech title

Segmentace řeči na slova bez supervize s pozornostním modelem

Type

conference paper

Language

English

Authors

GODARD, P.
BOITO, M.
ONDEL YANG, L.
BERARD, A.
YVON, F.
VILLAVICENCIO, A.
BESACIER, L.

URL

Keywords

computational language documentation,encoder-decoder models, attentional models, unsupervised word segmentation.

Abstract

We present a first attempt to perform attentional word segmentationdirectly from the speech signal, with the final goal toautomatically identify lexical units in a low-resource, unwrittenlanguage (UL). Our methodology assumes a pairing betweenrecordings in the UL with translations in a well-resourcedlanguage. It uses Acoustic Unit Discovery (AUD) to convertspeech into a sequence of pseudo-phones that is segmented usingneural soft-alignments produced by a neural machine translationmodel. Evaluation uses an actual Bantu UL, Mboshi;comparisons to monolingual and bilingual baselines illustratethe potential of attentional word segmentation for language documentation.

Published

2018

Pages

2678–2682

Journal

Proceedings of Interspeech, vol. 2018, no. 9, ISSN 1990-9772

Proceedings

Proceeding of Interspeech 2018

Conference

Interspeech Conference, Hyderabad, India, IN

Publisher

International Speech Communication Association

Place

Hyderabad

DOI

10.21437/Interspeech.2018-1308

UT WoS

000465363900561

EID Scopus

2-s2.0-85054965945

BibTeX

@inproceedings{BUT163406,
  author="GODARD, P. and BOITO, M. and ONDEL YANG, L. and BERARD, A. and YVON, F. and VILLAVICENCIO, A. and BESACIER, L.",
  title="Unsupervised Word Segmentation from Speech with Attention",
  booktitle="Proceeding of Interspeech 2018",
  year="2018",
  journal="Proceedings of Interspeech",
  volume="2018",
  number="9",
  pages="2678--2682",
  publisher="International Speech Communication Association",
  address="Hyderabad",
  doi="10.21437/Interspeech.2018-1308",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1308.pdf"
}

Files

pdf godard_interspeech2018_1308.pdf 328 kB