Advancing object detection to open-vocabulary and few-shot transfer has ...
Vision-Language Pretraining (VLP) models have recently successfully
faci...
As an important task in multimodal context understanding, Text-VQA (Visu...
Recent advances in the areas of multimodal machine learning and artifici...
Although open-domain question answering (QA) draws great attention in re...
Text-to-image retrieval is an essential task in multi-modal information
...
We introduce SPARTA, a novel neural retrieval method that shows great pr...