Publication Details

Robust Speech Recognition in Unknown Reverberant and Noisy Conditions

HSIAO Roger, MA Jeff, HARTMANN William, KARAFIÁT Martin, GRÉZL František, BURGET Lukáš, SZŐKE Igor, ČERNOCKÝ Jan, WATANABE Shinji, CHEN Zhuo, MALLIDI Sri Harish, HEŘMANSKÝ Hynek, TSAKALIDIS Stavros and SCHWARTZ Richard. Robust Speech Recognition in Unknown Reverberant and Noisy Conditions. In: Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop. Scottsdale, Arizona: IEEE Signal Processing Society, 2015, pp. 533-538. ISBN 978-1-4799-7291-3.

Czech title

Robustní rozpoznávání řeči v neznámých podmínkách s reverberací a šumem

Type

conference paper

Language

english

Authors

Hsiao Roger (Raytheon BBN)
Ma Jeff (Raytheon BBN)
Hartmann William (Raytheon BBN)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Grézl František, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
Watanabe Shinji, Dr. (JHU)
Chen Zhuo (Raytheon BBN)
Mallidi Sri Harish (AmazonCom)
Heřmanský Hynek, prof. Ing., Dr.Eng. (DCGM FIT BUT)
Tsakalidis Stavros (Raytheon BBN)
Schwartz Richard (Raytheon BBN)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2015/hsiao_asru2015_0000533.pdf PDF

Keywords

ASpIRE challenge, robust speech recognition

Abstract

In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.

Annotation

In this paper, we describe our work in the ASpIRE challenge. We experiment and evaluate different approaches to tackling the performance degradation due to noise and data mismatch. Our approaches include audio enhancement, data augmentation, unsupervised DNN adaptation, and system combination.

Published

2015

Pages

533-538

Proceedings

Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop

Conference

The 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), Scottsdale, Arizona, USA,, US

ISBN

978-1-4799-7291-3

Publisher

IEEE Signal Processing Society

Place

Scottsdale, Arizona, US

DOI

10.1109/ASRU.2015.7404841

UT WoS

000380604800076

EID Scopus

2-s2.0-84964470918

BibTeX

@INPROCEEDINGS{FITPUB11067,
   author = "Roger Hsiao and Jeff Ma and William Hartmann and Martin Karafi\'{a}t and Franti\v{s}ek Gr\'{e}zl and Luk\'{a}\v{s} Burget and Igor Sz\H{o}ke and Jan \v{C}ernock\'{y} and Shinji Watanabe and Zhuo Chen and Harish Sri Mallidi and Hynek He\v{r}mansk\'{y} and Stavros Tsakalidis and Richard Schwartz",
   title = "Robust Speech Recognition in Unknown Reverberant and Noisy Conditions",
   pages = "533--538",
   booktitle = "Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop",
   year = 2015,
   location = "Scottsdale, Arizona, US",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-4799-7291-3",
   doi = "10.1109/ASRU.2015.7404841",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11067"
}