We aim to investigate whether end-to-end learning of visual reasoning ca...
We propose a new task and model for dense video object captioning –
dete...
Current state-of-the-art video models process a video clip as a long seq...
Observing the close relationship among panoptic, semantic and instance
s...
Detection Transformer (DETR) directly transforms queries to unique objec...
We present a novel transformer-based architecture for global multi-objec...
Current object detectors are limited in vocabulary size due to the small...
Lidar-based sensing drives current autonomous vehicles. Despite rapid
pr...
We develop a probabilistic interpretation of two-stage object detection....
How do we build a general and broad object detection system? We use all
...
Three-dimensional objects are commonly represented as 3D boxes in a
poin...
Tracking has traditionally been the art of following interest points thr...
Detection identifies objects as axis-aligned boxes in an image. Most
suc...
With the advent of deep learning, object detection drifted from a bottom...
Semantic keypoints provide concise abstractions for a variety of visual
...
In this paper, we introduce a novel unsupervised domain adaptation techn...
In this paper, we study the task of 3D human pose estimation in the wild...
Learning articulated object pose is inherently difficult because the pos...
Previous learning based hand pose estimation methods does not fully expl...