To be successful, Vision-and-Language Navigation (VLN) agents must be ab...
We introduce Masked Trajectory Models (MTM) as a generic abstraction for...
We present the largest and most comprehensive empirical study of pre-tra...
We present a single neural network architecture composed of task-agnosti...
We present a scalable approach for learning open-world object-goal navig...
How should we learn visual representations for embodied agents that must...
Natural language instructions for visual navigation often use scene
desc...
We study the challenging problem of releasing a robot in a previously un...
Following a navigation instruction such as 'Walk down the stairs and sto...
We develop a language-guided navigation task set in a continuous 3D
envi...
Visual question answering requires high-order reasoning about an image, ...