Vision-language (VL) Pre-training (VLP) has shown to well generalize VL
...
Weakly-supervised vision-language (V-L) pre-training (W-VLP) aims at lea...
Image Difference Captioning (IDC) aims at generating sentences to descri...
Predicting a scene graph that captures visual entities and their interac...