We introduce Housekeep, a benchmark to evaluate commonsense reasoning in...
We study the problem of synthesizing immersive 3D indoor scenes from one...
Natural language instructions for visual navigation often use scene
desc...
It is fundamental for personal robots to reliably navigate to a specifie...
Recent Visual Question Answering (VQA) models have shown impressive
perf...
Textual cues are essential for everyday tasks like buying groceries and ...
Diverse and accurate vision+language modeling is an important goal to re...
We introduce EvalAI, an open source platform for evaluating and comparin...
Image captioning models have achieved impressive results on datasets
con...
We present Fabrik, an online neural network editor that provides tools t...
Temporal common sense has applications in AI tasks such as QA, multi-doc...
We conduct large-scale studies on `human attention' in Visual Question
A...
We are witnessing a proliferation of massive visual data. Unfortunately
...