Images or videos captured by the Under-Display Camera (UDC) suffer from
...
Most existing forecasting systems are memory-based methods, which attemp...
View Transformation Module (VTM), where transformations happen between
m...
We present the All-Seeing (AS) project: a large-scale data and model for...
The combination of audio and vision has long been a topic of interest in...
Image restoration in adverse weather conditions is a difficult task in
c...
With the exponential growth of video data, there is an urgent need for
a...
This paper presents a novel transformer architecture for graph represent...
Large language models (LLMs) have notably accelerated progress towards
a...
Action detection is a challenging video understanding task, requiring
mo...
We propose a simple, efficient, yet powerful framework for dense visual
...
In this report, we present our champion solution to the WSDM2023 Toloka
...
As the quality of optical sensors improves, there is a need for processi...
Image restoration under hazy weather condition, which is called single i...
In this report, we present our champion solutions to five tracks at Ego4...
Capturing the state changes of interacting objects is a key technology f...
We provide the technical report for Ego4D audio-only diarization challen...
Compared to the great progress of large-scale vision transformers (ViTs)...
Face Restoration (FR) aims to restore High-Quality (HQ) faces from
Low-Q...
StarCraft II (SC2) poses a grand challenge for reinforcement learning (R...
Incremental few-shot semantic segmentation (IFSS) targets at incremental...
Point cloud completion has become increasingly popular among generation ...
This work investigates a simple yet powerful adapter for Vision Transfor...
The transductive inference is an effective technique in the few-shot lea...
Temporal action detection (TAD) is extensively studied in the video
unde...
3D visual perception tasks, including 3D detection and map segmentation ...
Point normal, as an intrinsic geometric property of 3D objects, not only...
Temporal action detection aims to locate the boundaries of action in the...
We propose an accurate and efficient scene text detection framework, ter...
Deep-learning based Super-Resolution (SR) methods have exhibited promisi...
Recent approaches for end-to-end text spotting have achieved promising
r...
We present Panoptic SegFormer, a general framework for end-to-end panopt...
Few-shot learning aims to recognize new categories using very few labele...
Convolution on 3D point clouds that generalized from 2D grid-like domain...
StarCraft II (SC2) is a real-time strategy game, in which players produc...
We present an extremely simple Ultra-Resolution Style Transfer framework...
Recent deep-learning based Super-Resolution (SR) methods have achieved
r...
Scene text spotting aims to detect and recognize the entire word or sent...
The existing action recognition methods are mainly based on clip-level
c...
In this paper, we proposed a pipeline for inferring the relationship of ...
Detecting text located on the torsos of marathon runners and sports play...
Temporal modeling is crucial for capturing spatiotemporal structure in v...
Efficiency is an important issue in designing video architectures for ac...
Scene text detection, an important step of scene text reading systems, h...
StarCraft II provides an extremely challenging platform for reinforcemen...
StarCraft II poses a grand challenge for reinforcement learning. The mai...
Identifying crime for forensic investigating teams when crimes involve p...
The challenges of shape robust text detection lie in two aspects: 1) mos...
Basing on the analysis by revealing the equivalence of modern networks, ...