Tong Lu

research

∙ 09/09/2023

Deep Video Restoration for Under-Display Camera

Images or videos captured by the Under-Display Camera (UDC) suffer from ...

0 Xuanxi Chen, et al. ∙

research

∙ 08/15/2023

Memory-and-Anticipation Transformer for Online Action Understanding

Most existing forecasting systems are memory-based methods, which attemp...

0 Jiahao Wang, et al. ∙

research

∙ 08/04/2023

FB-BEV: BEV Representation from Forward-Backward View Transformations

View Transformation Module (VTM), where transformations happen between m...

0 Zhiqi Li, et al. ∙

research

∙ 08/03/2023

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

We present the All-Seeing (AS) project: a large-scale data and model for...

0 Weiyun Wang, et al. ∙

research

∙ 07/03/2023

AVSegFormer: Audio-Visual Segmentation with Transformer

The combination of audio and vision has long been a topic of interest in...

0 Shengyi Gao, et al. ∙

research

∙ 05/29/2023

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Image restoration in adverse weather conditions is a difficult task in c...

0 PetsTime, et al. ∙

research

∙ 05/22/2023

VideoLLM: Modeling Video Sequence with Large Language Models

With the exponential growth of video data, there is an urgent need for a...

0 Guo Chen, et al. ∙

research

∙ 05/19/2023

Graph Propagation Transformer for Graph Representation Learning

This paper presents a novel transformer architecture for graph represent...

0 Zhe Chen, et al. ∙

research

∙ 05/18/2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Large language models (LLMs) have notably accelerated progress towards a...

0 Wenhai Wang, et al. ∙

research

∙ 04/24/2023

MRSN: Multi-Relation Support Network for Video Action Detection

Action detection is a challenging video understanding task, requiring mo...

0 Yin-Dong Zheng, et al. ∙

research

∙ 03/30/2023

DDP: Diffusion Model for Dense Visual Prediction

We propose a simple, efficient, yet powerful framework for dense visual ...

0 Yuanfeng Ji, et al. ∙

research

∙ 01/22/2023

Champion Solution for the WSDM2023 Toloka VQA Challenge

In this report, we present our champion solution to the WSDM2023 Toloka ...

0 Shengyi Gao, et al. ∙

research

∙ 12/22/2022

Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method

As the quality of optical sensors improves, there is a need for processi...

0 PetsTime, et al. ∙

research

∙ 12/22/2022

Restoring Vision in Hazy Weather with Hierarchical Contrastive Learning

Image restoration under hazy weather condition, which is called single i...

0 PetsTime, et al. ∙

research

∙ 11/17/2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

In this report, we present our champion solutions to five tracks at Ego4...

0 Guo Chen, et al. ∙

research

∙ 11/16/2022

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

Capturing the state changes of interacting objects is a key technology f...

0 Yin-Dong Zheng, et al. ∙

research

∙ 11/16/2022

Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022

We provide the technical report for Ego4D audio-only diarization challen...

0 Jiahao Wang, et al. ∙

research

∙ 11/10/2022

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Compared to the great progress of large-scale vision transformers (ViTs)...

0 Wenhai Wang, et al. ∙

research

∙ 11/05/2022

A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Q...

0 PetsTime, et al. ∙

research

∙ 09/23/2022

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

StarCraft II (SC2) poses a grand challenge for reinforcement learning (R...

12 Ruo-Ze Liu, et al. ∙

research

∙ 07/26/2022

Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

Incremental few-shot semantic segmentation (IFSS) targets at incremental...

7 Guangchen Shi, et al. ∙

research

∙ 07/21/2022

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Point cloud completion has become increasingly popular among generation ...

0 Haoran Zhou, et al. ∙

research

∙ 05/17/2022

Vision Transformer Adapter for Dense Predictions

This work investigates a simple yet powerful adapter for Vision Transfor...

8 Zhe Chen, et al. ∙

research

∙ 05/17/2022

Uncertainty-based Network for Few-shot Image Classification

The transductive inference is an effective technique in the few-shot lea...

0 Minglei Yuan, et al. ∙

research

∙ 05/05/2022

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Temporal action detection (TAD) is extensively studied in the video unde...

0 Min Yang, et al. ∙

research

∙ 03/31/2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation ...

0 Zhiqi Li, et al. ∙

research

∙ 03/23/2022

Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Point normal, as an intrinsic geometric property of 3D objects, not only...

1 Haoran Zhou, et al. ∙

research

∙ 12/07/2021

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

Temporal action detection aims to locate the boundaries of action in the...

0 Guo Chen, et al. ∙

research

∙ 11/03/2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

We propose an accurate and efficient scene text detection framework, ter...

5 Zhe Chen, et al. ∙

research

∙ 10/23/2021

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

Deep-learning based Super-Resolution (SR) methods have exhibited promisi...

0 Guangpin Tao, et al. ∙

research

∙ 10/20/2021

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Recent approaches for end-to-end text spotting have achieved promising r...

0 Humen Zhong, et al. ∙

research

∙ 09/08/2021

Panoptic SegFormer

We present Panoptic SegFormer, a general framework for end-to-end panopt...

8 Zhiqi Li, et al. ∙

research

∙ 08/25/2021

Learning Class-level Prototypes for Few-shot Learning

Few-shot learning aims to recognize new categories using very few labele...

7 Minglei Yuan, et al. ∙

research

∙ 08/18/2021

Adaptive Graph Convolution for Point Cloud Analysis

Convolution on 3D point clouds that generalized from 2D grid-like domain...

0 Haoran Zhou, et al. ∙

research

∙ 04/14/2021

An Introduction of mini-AlphaStar

StarCraft II (SC2) is a real-time strategy game, in which players produc...

0 Ruo-Ze Liu, et al. ∙

research

∙ 03/22/2021

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

We present an extremely simple Ultra-Resolution Style Transfer framework...

7 Zhe Chen, et al. ∙

research

∙ 12/18/2020

Frequency Consistent Adaptation for Real World Super Resolution

Recent deep-learning based Super-Resolution (SR) methods have achieved r...

6 Xiaozhong Ji, et al. ∙

research

∙ 08/03/2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Scene text spotting aims to detect and recognize the entire word or sent...

0 Wenhai Wang, et al. ∙

research

∙ 06/28/2020

Dynamic Sampling Networks for Efficient Action Recognition in Videos

The existing action recognition methods are mainly based on clip-level c...

0 Yin-Dong Zheng, et al. ∙

research

∙ 06/16/2020

Channel Relationship Prediction with Forget-Update Module for Few-shot Classification

In this paper, we proposed a pipeline for inferring the relationship of ...

0 Minglei Yuan, et al. ∙

research

∙ 05/26/2020

A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

Detecting text located on the torsos of marathon runners and sports play...

5 Sauradip Nag, et al. ∙

research

∙ 05/14/2020

TAM: Temporal Adaptive Module for Video Recognition

Temporal modeling is crucial for capturing spatiotemporal structure in v...

0 Zhaoyang Liu, et al. ∙

research

∙ 11/21/2019

TEINet: Towards an Efficient Architecture for Video Recognition

Efficiency is an important issue in designing video architectures for ac...

0 Zhaoyang Liu, et al. ∙

research

∙ 08/16/2019

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Scene text detection, an important step of scene text reading systems, h...

2 Wenhai Wang, et al. ∙

research

∙ 03/02/2019

Efficient Reinforcement Learning with a Mind-Game for Full-Length StarCraft II

StarCraft II provides an extremely challenging platform for reinforcemen...

0 Ruo-Ze Liu, et al. ∙

research

∙ 09/23/2018

On Reinforcement Learning for Full-length Game of StarCraft

StarCraft II poses a grand challenge for reinforcement learning. The mai...

0 Zhen-Jia Pang, et al. ∙

research

∙ 06/19/2018

A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

Identifying crime for forensic investigating teams when crimes involve p...

2 Sauradip Nag, et al. ∙

research

∙ 06/07/2018

Shape Robust Text Detection with Progressive Scale Expansion Network

The challenges of shape robust text detection lie in two aspects: 1) mos...

0 Xiang Li, et al. ∙

research

∙ 02/06/2018

Mixed Link Networks

Basing on the analysis by revealing the equivalence of modern networks, ...

0 Wenhai Wang, et al. ∙

Tong Lu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro