Document understanding refers to automatically extract, analyze and
comp...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
In this paper, we present ChatPLUG, a Chinese open-domain dialogue syste...
Large-scale pretrained foundation models have been an emerging paradigm ...
Visual grounding focuses on establishing fine-grained alignment between
...
The Visual Question Answering (VQA) task utilizes both visual image and
...
Existing approaches to vision-language pre-training (VLP) heavily rely o...
Generating fluent and informative responses is of critical importance fo...
Attention plays a key role in the improvement of sequence-to-sequence-ba...
It is a challenging and practical research problem to obtain effective
c...