Deep learning models often rely only on a small set of features even whe...
Composed image retrieval aims to find an image that best matches a given...
Whether Transformers can learn to apply symbolic rules and generalise to...
Composed image retrieval searches for a target image based on a multi-mo...
Neural networks trained with ERM (empirical risk minimization) sometimes...
We extend the task of composed image retrieval, where an input query con...
We introduce an evaluation methodology for visual question answering (VQ...
The limits of applicability of vision-and-language models are defined by...
We present a novel mechanism to embed prior knowledge in a model for vis...
One of the primary challenges limiting the applicability of deep learnin...
The knowledge that humans hold about a problem often extends far beyond ...
One of the primary challenges faced by deep learning is the degree to wh...
One of the key limitations of traditional machine learning methods is th...
The predominant approach to Visual Question Answering (VQA) demands that...
A robot that can carry out a natural-language instruction has been a dre...
This paper presents a state-of-the-art model for visual question answeri...
Top-down visual attention mechanisms have been used extensively in image...
Part of the appeal of Visual Question Answering (VQA) is its promise to
...
This paper proposes to improve visual question answering (VQA) with
stru...
This paper shows how to extract dense optical flow from videos with a
co...