Recently, increasing efforts have been focused on Weakly Supervised Scen...
Given an image and a reference caption, the image caption editing task a...
The prevailing framework for matching multimodal inputs is based on a
tw...
The prevailing framework for solving referring expression grounding is b...
Area under ROC curve (AUC) is a widely used performance measure for
clas...