Action recognition is an important problem that requires identifying act...
Self-supervised video representation learning has been shown to effectiv...
The recent success of deep learning applications has coincided with thos...
Group Activity Recognition (GAR) detects the activity performed by a gro...
Conditional Generative Adversarial Networks (cGANs) extend the standard
...
This paper considers the problem of spatiotemporal object-centric reason...
We propose a sequential variational autoencoder to learn disentangled
re...
Pose tracking is an important problem that requires identifying unique h...
In this paper, we introduce a contextual grounding approach that capture...
Localizing moments in untrimmed videos via language queries is a new and...
Existing visual reasoning datasets such as Visual Question Answering (VQ...
We introduce a new inference task - Visual Entailment (VE) - which diffe...
Existing entailment datasets mainly pose problems which can be answered
...
We present Adaptive Memory Networks (AMN) that processes input-question ...
In recent years, many techniques have been developed to improve the
perf...
We address the problem of video captioning by grounding language generat...
Human actions often involve complex interactions across several inter-re...
Neural network based sequence-to-sequence models in an encoder-decoder
f...
The success of CNNs in various applications is accompanied by a signific...