Result Details

Bayesian joint-sequence models for grapheme-to-phoneme conversion

HANNEMANN, M.; TRMAL, J.; ONDEL YANG, L.; KESIRAJU, S.; BURGET, L. Bayesian joint-sequence models for grapheme-to-phoneme conversion. In Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017. p. 2836-2840. ISBN: 978-1-5090-4117-6.

Type

conference paper

Language

English

Authors

Hannemann Mirko, Ph.D.
Trmal Jan, Ing., Ph.D.
Ondel Lucas Antoine Francois, Mgr., Ph.D., DCGM (FIT)
Kesiraju Santosh, Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)

Abstract

We describe a fully Bayesian approach to grapheme-to-phonemeconversion based on the joint-sequence model (JSM). Usually, standardsmoothed n-gram language models (LM, e.g. Kneser-Ney)are used with JSMs to model graphone sequences (joint graphemephonemepairs). However, we take a Bayesian approach using ahierarchical Pitman-Yor-Process LM. This provides an elegant alternativeto using smoothing techniques to avoid over-training. Noheld-out sets and complex parameter tuning is necessary, and severalconvergence problems encountered in the discounted Expectation-Maximization (as used in the smoothed JSMs) are avoided. Everystep is modeled by weighted finite state transducers and implementedwith standard operations from the OpenFST toolkit. Weevaluate our model on a standard data set (CMUdict), where it givescomparable results to the previously reported smoothed JSMs interms of phoneme-error rate while requiring a much smaller training/testing time. Most importantly, our model can be used in aBayesian framework and for (partly) un-supervised training.

Keywords

Bayesian approach, joint-sequence models,weighted finite state transducers, letter-to-sound, grapheme-tophoneme conversion, hierarchical Pitman-Yor-Process

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2017/hannemann… PDF

Annotation

We describe a fully Bayesian approach to grapheme-to-phoneme conversion based on the joint-sequence model (JSM). Usually, standard smoothed n-gram language models (LM, e.g. Kneser-Ney) are used with JSMs to model graphone sequences (joint graphemephoneme pairs). However, we take a Bayesian approach using a hierarchical Pitman-Yor-Process LM. This provides an elegant alternative to using smoothing techniques to avoid over-training. No held-out sets and complex parameter tuning is necessary, and several convergence problems encountered in the discounted Expectation- Maximization (as used in the smoothed JSMs) are avoided. Every step is modeled by weighted finite state transducers and implemented with standard operations from the OpenFST toolkit. We evaluate our model on a standard data set (CMUdict), where it gives comparable results to the previously reported smoothed JSMs in terms of phoneme-error rate while requiring a much smaller training/ testing time. Most importantly, our model can be used in a Bayesian framework and for (partly) un-supervised training.

Published

2017

Pages

2836–2840

Proceedings

Proceedings of ICASSP 2017

Conference

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)

ISBN

978-1-5090-4117-6

Publisher

IEEE Signal Processing Society

Place

New Orleans

DOI

10.1109/ICASSP.2017.7952674

UT WoS

000414286203002

EID Scopus

2-s2.0-85023740605

BibTeX

@inproceedings{BUT144449,
  author="Mirko {Hannemann} and Jan {Trmal} and Lucas Antoine Francois {Ondel} and Santosh {Kesiraju} and Lukáš {Burget}",
  title="Bayesian joint-sequence models for grapheme-to-phoneme conversion",
  booktitle="Proceedings of ICASSP 2017",
  year="2017",
  pages="2836--2840",
  publisher="IEEE Signal Processing Society",
  address="New Orleans",
  doi="10.1109/ICASSP.2017.7952674",
  isbn="978-1-5090-4117-6",
  url="https://www.fit.vut.cz/research/publication/11469/"
}

Files

pdf hannemann_icassp2017_0002836.pdf 377 kB

Projects

DARPA Low Resource Languages for Emergent Incidents (LORELEI) - Exploiting Language Information for Situational Awareness (ELISA), University of Southern California, start: 2015-09-01, end: 2020-03-31, completed

Research groups

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (RG SPEECH)

Departments

Ústav počítačové grafiky a multimédií (DCGM)