Tengchao Lv

research

∙ 09/20/2023

Kosmos-2.5: A Multimodal Literate Model

We present Kosmos-2.5, a multimodal literate model for machine reading o...

0 Tengchao Lv, et al. ∙

research

∙ 02/27/2023

Language Is Not All You Need: Aligning Perception with Language Models

A big convergence of language, multimodal perception, action, and world ...

0 Shaohan Huang, et al. ∙

research

∙ 10/06/2022

XDoc: Unified Pre-training for Cross-Format Document Understanding

The surge of pre-training has witnessed the rapid development of documen...

0 Jingye Chen, et al. ∙

research

∙ 04/18/2022

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Self-supervised pre-training techniques have achieved remarkable progres...

0 Yupan Huang, et al. ∙

research

∙ 03/04/2022

DiT: Self-supervised Pre-training for Document Image Transformer

Image Transformer has recently achieved significant progress for natural...

6 Junlong Li, et al. ∙

research

∙ 11/16/2021

Document AI: Benchmarks, Models and Applications

Document AI, or Document Intelligence, is a relatively new research topi...

0 Lei Cui, et al. ∙

research

∙ 09/21/2021

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Text recognition is a long-standing research problem for document digita...

0 Minghao Li, et al. ∙

research

∙ 06/10/2021

VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

Video transcript summarization is a fundamental task for video understan...

0 Tengchao Lv, et al. ∙

research

∙ 04/18/2021

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has achieved SOTA p...

0 Yiheng Xu, et al. ∙

research

∙ 12/29/2020

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

Pre-training of text and layout has proved effective in a variety of vis...

0 Yang Xu, et al. ∙

Tengchao Lv

Featured Co-authors

Sign in with Google

Consider DeepAI Pro