Capsule Routing for Sound Event Detection
The detection of acoustic scenes is a challenging problem in which environmental sound events must be detected from a given audio signal. This includes classifying the events as well as estimating their onset and offset times. We approach this problem with a neural network architecture that uses the recently-proposed capsule routing mechanism. A capsule is a group of activation units representing a set of properties for an entity of interest, and the purpose of routing is to identify part-whole relationships between capsules. That is, a capsule in one layer is assumed to belong to a capsule in the layer above in terms of the entity being represented. Using capsule routing, we wish to train a network that can learn global coherence implicitly, thereby improving generalization performance. Our proposed method is evaluated on Task 4 of the DCASE 2017 challenge. Results show that classification performance is state-of-the-art, achieving an F-score of 58.6 overfitting is reduced considerably compared to other architectures.
READ FULL TEXT