Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

07/11/2016

∙

We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest. Our network achieved an average accuracy of 84.5 cross-validation for acoustic scene recognition, compared to the provided baseline of 72.5 tagging, compared to the baseline of 0.21. The network therefore improves the baselines by a relative amount of 17 consists of convolutional layers to extract features from the short-time Fourier transform and one global pooling layer to combine those features. It particularly possesses neither fully-connected layers, besides the fully-connected output layer, nor dropout layers.

READ FULL TEXT

Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

Sign in with Google

Consider DeepAI Pro