Although reinforcement learning has found widespread use in dense reward...
We propose learnable polyphase sampling (LPS), a pair of learnable
down/...
Recent work on 3D-aware image synthesis has achieved compelling results ...
Recent advances in pre-training vision-language models like CLIP have sh...
Single-view RGB-D human reconstruction with implicit functions is often
...
While recovery of geometry from image and video data has received a lot ...
We present XMem, a video object segmentation architecture for long video...
We introduce an approach for selecting objects in neural volumetric 3D
r...
To solve difficult tasks, humans ask questions to acquire knowledge from...
Optimization within a layer of a deep-net has emerged as a new direction...
Designing equivariance as an inductive bias into deep-nets has been a
pr...
We find Mask2Former also achieves state-of-the-art performance on video
...
We introduce REDO, a class-agnostic framework to REconstruct the Dynamic...
Image segmentation is about grouping pixels with different semantics, e....
Solving complex real-world tasks, e.g., autonomous fleet control, often
...
Exploration is critical for good results in deep reinforcement learning ...
Modern approaches typically formulate semantic segmentation as a per-pix...
Classical adversarial training (AT) frameworks are designed to achieve h...
Extracting detailed 3D information of objects from video data is an impo...
We introduce WyPR, a Weakly-supervised framework for Point cloud Recogni...
The recent growth of web video sharing platforms has increased the deman...
Controllable semantic image editing enables a user to change entire imag...
Deep reinforcement learning (RL) is computationally demanding and requir...
Existing work on object detection often relies on a single form of
annot...
Existing semi-supervised learning (SSL) algorithms use a single weight t...
Combinatorial optimization is frequently used in computer vision. For
in...
Weakly supervised learning has emerged as a compelling tool for object
d...
We propose Chirality Nets, a family of deep nets that is equivariant to ...
Sample efficiency and scalability to a large number of agents are two
im...
Inferring the most likely configuration for a subset of variables of a j...
Reasoning is an important ability that we learn from a very early age. Y...
We present an empirical evaluation of fMRI data augmentation via synthes...
A zoo of deep nets is available these days for almost any given task, an...
Accurately answering a question about a given image requires combining
o...
Colorizing a given gray-level image is an important task in the media an...
Question answering is an important task for autonomous agents and virtua...
Unsupervised video segmentation plays an important role in a wide variet...
Video object segmentation is challenging yet important in a wide variety...
Paragraph generation from images, which has gained popularity recently, ...
Textual grounding is an important but challenging task for human-compute...
Instance level video object segmentation is an important technique for v...
Textual grounding, i.e., linking words to objects in images, is a challe...
This paper explores image caption generation using conditional variation...
The quest for algorithms that enable cognitive abilities is an important...
We propose a novel training algorithm for reinforcement learning which
c...
Semantic image inpainting is a challenging task where large missing regi...
In this paper we tackle the problem of instance-level segmentation and d...
Convolutional neural networks with many layers have recently been shown ...