We explore object detection with two attributes: color and material. The...
Video Question Answering (VidQA) evaluation metrics have been limited to...
We propose a new framework for understanding and representing related sa...
Phrase grounding models localize an object in the image given a referrin...
We explore the task of Video Object Grounding (VOG), which grounds objec...
A phrase grounding system localizes a particular object in an image refe...