Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

07/11/2016
by   Lars Hertel, et al.
0

We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest. Our network achieved an average accuracy of 84.5 cross-validation for acoustic scene recognition, compared to the provided baseline of 72.5 tagging, compared to the baseline of 0.21. The network therefore improves the baselines by a relative amount of 17 consists of convolutional layers to extract features from the short-time Fourier transform and one global pooling layer to combine those features. It particularly possesses neither fully-connected layers, besides the fully-connected output layer, nor dropout layers.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro