Vision-Language Pre-training (VLP) methods based on object detection enj...
Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
Recent years have witnessed a big convergence of language, vision, and
m...
Large-scale pretrained foundation models have been an emerging paradigm ...
The Visual Question Answering (VQA) task utilizes both visual image and
...
Existing approaches to vision-language pre-training (VLP) heavily rely o...
Vision-language pre-training (VLP) on large-scale image-text pairs has
a...
Large pre-trained language models achieve state-of-the-art results when
...
Vision-language pre-training (VLP) on large-scale image-text pairs has
r...
Learning to control the structure of sentences is a challenging problem ...
Recent studies about learning multilingual representations have achieved...
Self-supervised pre-training has emerged as a powerful technique for nat...
The ability of semantic reasoning over the sentence pair is essential fo...
Commonsense and background knowledge is required for a QA model to answe...
Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devli...
A fundamental trade-off between effectiveness and efficiency needs to be...
This paper proposes a novel neural machine reading model for open-domain...
Previous studies have demonstrated the empirical success of word embeddi...