Document understanding refers to automatically extract, analyze and
comp...
To promote the development of Vision-Language Pre-training (VLP) and
mul...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
Recent years have witnessed a big convergence of language, vision, and
m...
Large-scale pretrained foundation models have been an emerging paradigm ...
Visual grounding focuses on establishing fine-grained alignment between
...
Inferring the substitutable and complementary products for a given produ...