Zhan Tong

research

∙ 05/23/2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

The ultimate goal for foundation models is realizing task-agnostic, i.e....

3 Ziyun Zeng, et al. ∙

research

∙ 04/17/2023

Efficient Video Action Detection with Token Dropout and Context Refinement

Streaming video clips with large-scale video tokens impede vision transf...

0 Lei Chen, et al. ∙

research

∙ 04/07/2023

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Human visual recognition is a sparse process, where only a few salient v...

0 Ziteng Gao, et al. ∙

research

∙ 03/30/2023

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

Contrastive learning methods train visual encoders by comparing views fr...

0 Chongjian Ge, et al. ∙

research

∙ 03/29/2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Scale is the primary factor for building a powerful foundation model tha...

0 Limin Wang, et al. ∙

research

∙ 03/28/2023

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

The relation modeling between actors and scene context advances video ac...

0 Lei Chen, et al. ∙

research

∙ 05/26/2022

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Although the pre-trained Vision Transformers (ViTs) achieved great succe...

0 Shoufa Chen, et al. ∙

research

∙ 03/23/2022

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Pre-training video transformers on extra large-scale datasets is general...

33 Zhan Tong, et al. ∙

research

∙ 02/16/2022

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

Vision Transformers (ViTs) take all the image patches as tokens and cons...

9 Youwei Liang, et al. ∙

research

∙ 04/20/2021

MGSampler: An Explainable Sampling Strategy for Video Action Recognition

Frame sampling is a fundamental problem in video action recognition due ...

0 Yuan Zhi, et al. ∙

research

∙ 12/18/2020

TDN: Temporal Difference Networks for Efficient Action Recognition

Temporal modeling still remains challenging for action recognition in vi...

0 Limin Wang, et al. ∙

Zhan Tong

Featured Co-authors

Sign in with Google

Consider DeepAI Pro