Video-based scene graph generation (VidSGG) is an approach that aims to
...
Predicting the impact of publications in science and technology has beco...
Distinctive Image Captioning (DIC) – generating distinctive captions tha...
Given an image and a reference caption, the image caption editing task a...
Today's VidSGG models are all proposal-based methods, i.e., they first
g...
Given an untrimmed video and a natural language query, Natural Language ...
With recent advances in distantly supervised (DS) relation extraction (R...
Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e.,...
The prevailing framework for matching multimodal inputs is based on a
tw...
We aim to address the problem of Natural Language Video Localization
(NL...
Visual attention has been successfully applied in structural prediction ...