We present Kosmos-2.5, a multimodal literate model for machine reading o...
A big convergence of language, multimodal perception, action, and world
...
Bayesian Optimization (BO) is a common solution to search optimal
hyperp...
The surge of pre-training has witnessed the rapid development of documen...
Anomaly detection with only prior knowledge from normal samples attracts...
Despite the rapid advance of unsupervised anomaly detection, existing me...
While achieving remarkable success for medical image segmentation, deep
...
Self-supervised pre-training techniques have achieved remarkable progres...
Image Transformer has recently achieved significant progress for natural...
Few-shot counting aims to count objects of any class in an image given o...
Document AI, or Document Intelligence, is a relatively new research topi...
Multimodal pre-training with text, layout, and image has made significan...
Text recognition is a long-standing research problem for document
digita...
Reading order detection is the cornerstone to understanding visually-ric...
Video transcript summarization is a fundamental task for video understan...
Data sampling acts as a pivotal role in training deep learning models.
H...
Multimodal pre-training with text, layout, and image has achieved SOTA
p...
Pre-training of text and layout has proved effective in a variety of
vis...
Document layout analysis usually relies on computer vision models to
und...
Automatic live commenting aims to provide real-time comments on videos f...
Pre-training techniques have been verified successfully in a variety of ...
Multi-label image and video classification are fundamental yet challengi...
We present TableBank, a new image-based table detection and recognition
...
Article comments can provide supplementary opinions and facts for reader...
We introduce the task of automatic live commenting. Live commenting, whi...
In this paper, we study a novel task that learns to compose music from
n...
Dialogue systems are usually built on either generation-based or
retriev...
Conventional Open Information Extraction (Open IE) systems are usually b...