Existing image editing tools, while powerful, typically disregard the
un...
Natural language processing and 2D vision models have attained remarkabl...
Many pixelwise dense prediction tasks-depth estimation and semantic
segm...
In the last year alone, a surge of new benchmarks to measure composition...
We propose Neural Priming, a technique for adapting large pretrained mod...
As general purpose vision models get increasingly effective at a wide se...
Compositional representations of the world are a promising step towards
...
Massive data corpora like WebText, Wikipedia, Conceptual Captions,
WebIm...
Training embodied agents in simulation has become mainstream for the emb...
Training effective embodied AI agents often involves manual reward
engin...
Remote sensing images are useful for a wide variety of environmental and...
We present VISPROG, a neuro-symbolic approach to solving complex and
com...
Many high-level skills that are required for computer vision tasks, such...
We propose Unified-IO, a model that performs a large variety of AI tasks...
Today's state of the art visual navigation agents typically consist of l...
Massive datasets and high-capacity models have driven many recent
advanc...
Computer vision models excel at making predictions when the test distrib...
Object manipulation is a critical skill required for Embodied AI agents
...
Embodied AI has seen steady progress across a diverse set of independent...
General purpose vision (GPV) systems are models that are designed to sol...
Communicating with humans is challenging for AIs because it requires a s...
Contrastive language image pretraining (CLIP) encoders have been shown t...
As an attempt towards assessing the robustness of embodied navigation ag...
Convolutional neural networks (CNNs) are ubiquitous in computer vision, ...
We propose PIGLeT: a model that learns physical commonsense knowledge th...
The domain of Embodied AI has recently witnessed substantial progress,
p...
While deep reinforcement learning (RL) promises freedom from hand-labele...
We propose a new framework for understanding and representing related sa...
A special purpose learning system assumes knowledge of admissible tasks ...
There has been a significant recent progress in the field of Embodied AI...
Mirroring the success of masked language models, vision-and-language
cou...
The domain of Embodied AI, in which agents learn to complete tasks throu...
Why do agents often obtain better reinforcement learning policies when
i...
Autonomous agents must learn to collaborate. It is not scalable to devel...
Enabling robust intelligence in the wild entails learning systems that o...
We present the Supermasks in Superposition (SupSup) model, capable of
se...
We revisit the problem of Object-Goal Navigation (ObjectNav). In its sim...
We present a general computational approach that enables a machine to
ge...
Much of the remarkable progress in computer vision has been focused arou...
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undenia...
We introduce Grounded Situation Recognition (GSR), a task that requires
...
The ubiquity of embodied gameplay, observed in a wide variety of animal
...
Training a neural network is synonymous with learning the values of the
...
Collaboration is a necessary skill to perform tasks that are beyond one
...
Scale variation has been a challenge from traditional to modern approach...
Imagining a scene described in natural language with realistic layout an...
We introduce Interactive Question Answering (IQA), the task of answering...
Diagrams often depict complex phenomena and serve as a good test bed for...
A number of studies have found that today's Visual Question Answering (V...
Visual Question Answering (VQA) has received a lot of attention over the...