Transformers are widely used for solving tasks in natural language
proce...
Visual question answering is a vision-and-language multimodal task, that...
There have been a number of techniques that have demonstrated the genera...
One of the major limitations of deep learning models is that they face
c...
Generating natural questions from an image is a semantic task that requi...
In this paper, we propose a method to obtain robust explanations for vis...
In this paper, we propose a method for obtaining sentence-level embeddin...
In this paper, we consider the problem of solving semantic tasks such as...
In this paper, we aim to obtain improved attention for a visual question...
In order to successfully perform tasks specified by natural language
ins...
Vision and language tasks have benefited from attention. There have been...
In this paper, we propose a probabilistic framework for solving the task...
Understanding and explaining deep learning models is an imperative task....
Generating natural questions from an image is a semantic task that requi...
In this paper, we propose a method for obtaining sentence-level embeddin...