Text-based Visual Question Answering (TextVQA) aims at answering questio...
Large Language Models (LLMs) have revolutionized natural language proces...
Most graph-to-text works are built on the encoder-decoder framework with...
Texts in scene images convey critical information for scene understandin...
Table-to-text generation aims at automatically generating natural text t...
Modern object detection methods based on convolutional neural network su...
We propose a novel self-supervised method, referred to as Video Cloze
Pr...