Publication Details

Neural Network Bottleneck Features for Language Identification

MATĚJKA Pavel, ZHANG Le, NG Tim, MALLIDI Sri Harish, GLEMBEK Ondřej, MA Jeff and ZHANG Bing. Neural Network Bottleneck Features for Language Identification. In: Proceedings of Odyssey 2014. Joensuu: International Speech Communication Association, 2014, pp. 299-304. ISSN 2312-2846.

Czech title

Příznaky z neuronové sítě s úzkým hrdlem pro identifikaci jazyka

Type

conference paper

Language

english

Authors

Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Zhang Le (Raytheon BBN)
Ng Tim (Raytheon BBN)
Mallidi Sri Harish (AmazonCom)
Glembek Ondřej, Ing., Ph.D. (DCGM FIT BUT)
Ma Jeff (Raytheon BBN)
Zhang Bing (Raytheon BBN)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2014/matejka_odyssey2014_299-304-35.pdf PDF

Keywords

language identification, noisy speech, robust feature extraction

Abstract

We have presented the bottleneck features in the context of Language identification. It combines benefits of both phonotactic and acoustic system. Usually, the phonotactic system is favorable for the long duration files, while acoustic for the short ones. This approach takes the advantage of both. In addition, we can also use modeling of context dependent phonemes in bottleneck features. This brings very nice improvement over the context independent phonemes.

Annotation

This paper presents the application of Neural Network Bottleneck (BN) features in Language Identification (LID). BN features are generally used for Large Vocabulary Speech Recognition in conjunction with conventional acoustic features, such as MFCC or PLP.We compare the BN features to several common types of acoustic features used in the state-of-the-art LID systems. The test set is from DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from highly degraded radio communication channels. On this type of noisy data, we show that in average, the BN features provide a 45% relative improvement in the Cavgor Equal Error Rate (EER) metrics across several test duration conditions, with respect to our single best acoustic features.

Published

2014

Pages

299-304

Journal

Proceedings of Odyssey: The Speaker and Language Recognition Workshop, vol. 2014, no. 6, ISSN 2312-2846

Proceedings

Proceedings of Odyssey 2014

Conference

Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, FI

Publisher

International Speech Communication Association

Place

Joensuu, FI

EID Scopus

2-s2.0-85073163761

BibTeX

@INPROCEEDINGS{FITPUB10686,
   author = "Pavel Mat\v{e}jka and Le Zhang and Tim Ng and Harish Sri Mallidi and Ond\v{r}ej Glembek and Jeff Ma and Bing Zhang",
   title = "Neural Network Bottleneck Features for Language Identification",
   pages = "299--304",
   booktitle = "Proceedings of Odyssey 2014",
   journal = "Proceedings of Odyssey: The Speaker and Language Recognition Workshop",
   volume = 2014,
   number = 6,
   year = 2014,
   location = "Joensuu, FI",
   publisher = "International Speech Communication Association",
   ISSN = "2312-2846",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10686"
}