Visual document understanding is a complex task that involves analyzing ...
A big convergence of language, multimodal perception, action, and world
...
A big convergence of language, vision, and multimodal pretraining is
eme...
We propose a novel algorithm for compressive imaging that exploits both ...