Publication Details

Multimodal Phoneme Recognition of Meeting Data

MOTLÍČEK Petr and ČERNOCKÝ Jan. Multimodal Phoneme Recognition of Meeting Data. In: 7th International Conference, TSD 2004 Brno, Czech Republic, September 2004 Proceedings. Brno: Springer Verlag, 2004, pp. 379-384. ISBN 3-540-23049-1. ISSN 0302-9743.

Czech title

Multimodální rozpoznávání fonémů na meeting datech

Type

conference paper

Language

english

Authors

Motlíček Petr, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

http://www.fit.vutbr.cz/~motlicek/publi/2004/tsd.pdf PDF

Keywords

speech processing, audio-video processing, feature extraction, pattern recognition

Abstract

Multimodal Phoneme Recognition of Meeting Data

Annotation

This paper describes experiments in automatic recognition of context-independent phoneme strings from meeting data using audio-visual features. Visual features are known to improve accuracy and noise robustness of automatic speech recognizers. However, many problems appear when not ``visually clean'' data is provided, such as data without limited variation in the speaker's frontal pose, lighting conditions, background, etc. The goal of this work was to test whether visual information can be helpful for recognition of phonemes using neural nets. While the audio part is fixed and uses standard Mel filter-bank energies, different features describing the video were tested: average brightness, DCT coefficients extracted from region-of-interest (ROI), optical flow analysis and lip-position features. The recognition was evaluated on a sub-set of IDIAP meeting room data. We have seen small improvement when compared to purely audio-recognition, but further work needs to be done especially concerning the determination of reliability of video features.

Published

2004

Pages

379-384

Journal

Lecture Notes in Computer Science, vol. 2004, no. 9, ISSN 0302-9743

Proceedings

7th International Conference, TSD 2004 Brno, Czech Republic, September 2004 Proceedings

Conference

Seventh International conference on Text, Speech and Dialogue, Brno, CZ

ISBN

3-540-23049-1

Publisher

Springer Verlag

Place

Brno, CZ

BibTeX

@INPROCEEDINGS{FITPUB7487,
   author = "Petr Motl\'{i}\v{c}ek and Jan \v{C}ernock\'{y}",
   title = "Multimodal Phoneme Recognition of Meeting Data",
   pages = "379--384",
   booktitle = "7th International Conference, TSD 2004 Brno, Czech Republic, September 2004 Proceedings",
   journal = "Lecture Notes in Computer Science",
   volume = 2004,
   number = 09,
   year = 2004,
   location = "Brno, CZ",
   publisher = "Springer Verlag",
   ISBN = "3-540-23049-1",
   ISSN = "0302-9743",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/7487"
}