We present Kosmos-2.5, a multimodal literate model for machine reading o...
A big convergence of language, multimodal perception, action, and world
...
The surge of pre-training has witnessed the rapid development of documen...
Self-supervised pre-training techniques have achieved remarkable progres...
Image Transformer has recently achieved significant progress for natural...
Document AI, or Document Intelligence, is a relatively new research topi...
Text recognition is a long-standing research problem for document
digita...
Video transcript summarization is a fundamental task for video understan...
Multimodal pre-training with text, layout, and image has achieved SOTA
p...
Pre-training of text and layout has proved effective in a variety of
vis...