Result Details

Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords

HANNEMANN, M.; KOMBRINK, S.; KARAFIÁT, M.; BURGET, L. Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords. Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). Proceedings of Interspeech. Makuhari, Chiba: International Speech Communication Association, 2010. no. 9, p. 897-900. ISBN: 978-1-61782-123-3. ISSN: 1990-9772.

Type

conference paper

Language

English

Authors

Hannemann Mirko, Ph.D., FIT (FIT), DCGM (FIT)
Kombrink Stefan, Dipl.-Linguist., FIT (FIT), DCGM (FIT)
Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)

Abstract

This paper is on development of a similarity measure to detect repeatedly occuring Out-of-Vocabulary words (OOV), because they carry an important information.

Keywords

out-of-vocabulary, OOV, hybrid word/sub-word recognizer, similarity measure, alignment error model

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2010/hanneman…

Annotation

We develop a similarity measure to detect repeatedly occurring Out-of-Vocabulary words (OOV), since these carry important information. Sub-word sequences in the recognition output from a hybrid word/sub-word recognizer are taken as detected OOVs and are aligned to each other with the help of an alignment error model. This model is able to deal with partial OOV detections and tries to reveal more complex word relations such as compound words. We apply the model to a selection of conversational phone calls to retrieve other examples of the same OOV, and to obtain a higher-level description of it such as being a derivation of a known word.

Published

2010

Pages

897–900

Journal

Proceedings of Interspeech, vol. 2010, no. 9, ISSN 1990-9772

Proceedings

Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010)

Conference

Interspeech Conference

ISBN

978-1-61782-123-3

Publisher

International Speech Communication Association

Place

Makuhari, Chiba

BibTeX

@inproceedings{BUT34859,
  author="Mirko {Hannemann} and Stefan {Kombrink} and Martin {Karafiát} and Lukáš {Burget}",
  title="Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords",
  booktitle="Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010)",
  year="2010",
  journal="Proceedings of Interspeech",
  volume="2010",
  number="9",
  pages="897--900",
  publisher="International Speech Communication Association",
  address="Makuhari, Chiba",
  isbn="978-1-61782-123-3",
  issn="1990-9772",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/hanneman_interspeech2010_IS100358.pdf"
}

Projects

DIRAC - Detection and Identification of Rare Audio-visual Cues, MŠMT, Šestý rámcový program Evropského společenství pro výzkum, technický rozvoj a demonstrační činnosti, 027787, start: 2006-01-01, end: 2010-12-31, completed
Recognition and presentation of multimedia data, BUT, Vnitřní projekty VUT, FIT-S-10-2, 2010, start: 2010-04-01, end: 2010-12-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed

Research groups

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (RG SPEECH)

Departments

Ústav počítačové grafiky a multimédií (DCGM)