We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations
cap...
In egocentric videos, actions occur in quick succession. We capitalise o...
We propose a two-stream convolutional network for audio recognition, tha...
We focus on multi-modal fusion for egocentric action recognition, and pr...
First-person vision is gaining interest as it offers a unique viewpoint ...
In this work, a novel method based on the learning using privileged
info...