Residual Memory Networks: Feed-forward approach to learn long-term temporal dependencies

Czech title

Residuální paměťové sítě: nerekurentní přístup k učení dlouhých časových závislostí

Type

conference paper

Language

english

Authors

Baskar Murali K. (DCGM FIT BUT)
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Grézl František, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2017/baskar_icassp2017_0004810.pdf PDF

Keywords

Automatic speech recognition, LSTM, RNN, Residual memory networks.

Abstract

Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks.

Annotation

Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks.

Published

2017

Pages

4810-4814

Proceedings

Proceedings of ICASSP 2017

Conference

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), New Orleans, USA, US

ISBN

978-1-5090-4117-6

Publisher

IEEE Signal Processing Society

Place

New Orleans, US

DOI

10.1109/ICASSP.2017.7953070

UT WoS

000414286204194

EID Scopus

2-s2.0-85023739371

BibTeX

@INPROCEEDINGS{FITPUB11467,
   author = "K. Murali Baskar and Martin Karafi\'{a}t and Luk\'{a}\v{s} Burget and Karel Vesel\'{y} and Franti\v{s}ek Gr\'{e}zl and Jan \v{C}ernock\'{y}",
   title = "Residual Memory Networks: Feed-forward approach to learn long-term temporal dependencies",
   pages = "4810--4814",
   booktitle = "Proceedings of ICASSP 2017",
   year = 2017,
   location = "New Orleans, US",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-5090-4117-6",
   doi = "10.1109/ICASSP.2017.7953070",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11467"
}