Structured agents for physical construction
Physical construction -- the ability to compose objects, subject to physical dynamics, in order to serve some function -- is fundamental to human intelligence. Here we introduce a suite of challenging physical construction tasks inspired by how children play with blocks, such as matching a target configuration, stacking and attaching blocks to connect objects together, and creating shelter-like structures over target objects. We then examine how a range of modern deep reinforcement learning agents fare on these challenges, and introduce several new approaches which provide superior performance. Our results show that agents which use structured representations (e.g., objects and scene graphs) and structured policies (e.g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes. Agents which use model-based planning via Monte-Carlo Tree Search also outperform strictly model-free agents in our most challenging construction problems. We conclude that approaches which combine structured representations and reasoning with powerful learning are a key path toward agents that possess rich intuitive physics, scene understanding, and planning.
READ FULL TEXT