In this paper, we re-examine the task of cross-modal clip-sentence retri...
Given a gallery of uncaptioned video sequences, this paper considers the...
Current video retrieval efforts all found their evaluation on an
instanc...
We propose a three-dimensional discrete and incremental scale to encode ...
We address the problem of cross-modal fine-grained action retrieval betw...
This work introduces verb-only representations for both recognition and
...
This work introduces verb-only representations for actions and interacti...
First-person vision is gaining interest as it offers a unique viewpoint ...
Manual annotations of temporal bounds for object interactions (i.e. star...
This work deviates from easy-to-define class boundaries for object
inter...
We present SEMBED, an approach for embedding an egocentric object intera...