Referring Expression Generation (REG) aims to generate unambiguous Refer...
Existing Visual Question Answering (VQA) models have explored various vi...
Most existing approaches to Visual Question Answering (VQA) answer quest...
Visual dialog has witnessed great progress after introducing various
vis...
Multimodal pre-training models, such as LXMERT, have achieved excellent
...
A goal-oriented visual dialogue involves multi-turn interactions between...
This paper describes our solution to the multi-modal learning challenge ...
The ICML 2013 Workshop on Challenges in Representation Learning focused ...