Zhiliang Peng

research

∙ 06/26/2023

Kosmos-2: Grounding Multimodal Large Language Models to the World

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enablin...

0 Zhiliang Peng, et al. ∙

research

∙ 02/28/2023

Generic-to-Specific Distillation of Masked Autoencoders

Large vision Transformers (ViTs) driven by self-supervised pre-training ...

0 Wei Huang, et al. ∙

research

∙ 10/19/2022

A Unified View of Masked Image Modeling

Masked image modeling has demonstrated great potential to eliminate the ...

0 Zhiliang Peng, et al. ∙

research

∙ 10/12/2022

Foundation Transformers

A big convergence of model architectures across language, vision, speech...

26 Hongyu Wang, et al. ∙

research

∙ 08/22/2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

A big convergence of language, vision, and multimodal pretraining is eme...

6 Wenhui Wang, et al. ∙

research

∙ 08/12/2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Masked image modeling (MIM) has demonstrated impressive results in self-...

0 Zhiliang Peng, et al. ∙

research

∙ 05/19/2022

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Modern object detectors have taken the advantages of pre-trained vision ...

0 Xiaosong Zhang, et al. ∙

research

∙ 10/06/2021

Long-tailed Distribution Adaptation

Recognizing images with long-tailed distributions remains a challenging ...

1 Zhiliang Peng, et al. ∙

research

∙ 05/09/2021

Conformer: Local Features Coupling Global Representations for Visual Recognition

Within Convolutional Neural Network (CNN), the convolution operations ar...

12 Zhiliang Peng, et al. ∙

research

∙ 03/27/2021

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) is a challenging problem wh...

0 Wei Gao, et al. ∙

Zhiliang Peng

Featured Co-authors

Sign in with Google

Consider DeepAI Pro