Publication Details

Bayesian joint-sequence models for grapheme-to-phoneme conversion

HANNEMANN Mirko, TRMAL Jan, ONDEL Yang Lucas Antoine Francois, KESIRAJU Santosh and BURGET Lukáš. Bayesian joint-sequence models for grapheme-to-phoneme conversion. In: Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017, pp. 2836-2840. ISBN 978-1-5090-4117-6.
Czech title
Bayesovské modelování sdružených sekvencí pro převod grafémů na fonémy
Type
conference paper
Language
english
Authors
Hannemann Mirko, Dipl.-Ing. (DCGM FIT BUT)
Trmal Jan (JHU)
Ondel Yang Lucas Antoine Francois, Mgr., Ph.D. (DCGM FIT BUT)
Kesiraju Santosh (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

Bayesian approach, joint-sequence models, weighted finite state transducers, letter-to-sound, grapheme-tophoneme conversion, hierarchical Pitman-Yor-Process

Abstract

We describe a fully Bayesian approach to grapheme-to-phoneme conversion based on the joint-sequence model (JSM). Usually, standard smoothed n-gram language models (LM, e.g. Kneser-Ney) are used with JSMs to model graphone sequences (joint graphemephoneme pairs). However, we take a Bayesian approach using a hierarchical Pitman-Yor-Process LM. This provides an elegant alternative to using smoothing techniques to avoid over-training. No held-out sets and complex parameter tuning is necessary, and several convergence problems encountered in the discounted Expectation- Maximization (as used in the smoothed JSMs) are avoided. Every step is modeled by weighted finite state transducers and implemented with standard operations from the OpenFST toolkit. We evaluate our model on a standard data set (CMUdict), where it gives comparable results to the previously reported smoothed JSMs in terms of phoneme-error rate while requiring a much smaller training/ testing time. Most importantly, our model can be used in a Bayesian framework and for (partly) un-supervised training.

Annotation

We describe a fully Bayesian approach to grapheme-to-phoneme conversion based on the joint-sequence model (JSM). Usually, standard smoothed n-gram language models (LM, e.g. Kneser-Ney) are used with JSMs to model graphone sequences (joint graphemephoneme pairs). However, we take a Bayesian approach using a hierarchical Pitman-Yor-Process LM. This provides an elegant alternative to using smoothing techniques to avoid over-training. No held-out sets and complex parameter tuning is necessary, and several convergence problems encountered in the discounted Expectation- Maximization (as used in the smoothed JSMs) are avoided. Every step is modeled by weighted finite state transducers and implemented with standard operations from the OpenFST toolkit. We evaluate our model on a standard data set (CMUdict), where it gives comparable results to the previously reported smoothed JSMs in terms of phoneme-error rate while requiring a much smaller training/ testing time. Most importantly, our model can be used in a Bayesian framework and for (partly) un-supervised training.

Published
2017
Pages
2836-2840
Proceedings
Proceedings of ICASSP 2017
Conference
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), New Orleans, USA, US
ISBN
978-1-5090-4117-6
Publisher
IEEE Signal Processing Society
Place
New Orleans, US
DOI
UT WoS
000414286203002
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB11469,
   author = "Mirko Hannemann and Jan Trmal and Francois Antoine Lucas Yang Ondel and Santosh Kesiraju and Luk\'{a}\v{s} Burget",
   title = "Bayesian joint-sequence models for grapheme-to-phoneme conversion",
   pages = "2836--2840",
   booktitle = "Proceedings of ICASSP 2017",
   year = 2017,
   location = "New Orleans, US",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-5090-4117-6",
   doi = "10.1109/ICASSP.2017.7952674",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11469"
}
Back to top