Publication Details
Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
Audio scene classification, Convolutional neural networks, Deep learning, x-vectors, Regularized LDA
In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainly used in image classification. The second one is a one-dimensional CNN for extracting fixed-length audio segment embeddings, so called x-vectors, which has also been used in speech processing, especially for speaker recognition. In addition to the different topologies, two types of features were tested: log mel-spectrogram and CQT features. Finally, the outputs of different systems are fused using a simple output averaging in the best performing system. Our submissions ranked third among 24 teams in the ASC sub-task A (task 1a).
@INPROCEEDINGS{FITPUB11882, author = "Hossein Zeinali and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}", title = "Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge", pages = "1--5", booktitle = "Proceedings of DCASE 2018 Workshop", year = 2018, location = "Surrey, GB", publisher = "Tampere University of Technology", ISBN = "978-952-15-4262-6", language = "english", url = "https://www.fit.vut.cz/research/publication/11882" }