We study how vision-language models trained on Internet-scale data can b...
We present a framework for robot skill acquisition, which 1) efficiently...
We observe that pre-trained large language models (LLMs) are capable of
...
Large language models excel at a wide range of complex tasks. However,
e...
Recent progress in large language models (LLMs) has demonstrated the abi...
Expert demonstrations are a rich source of supervision for training visu...
We present a differentiable formulation of rigid-body contact dynamics f...
Humans form mental images of 3D scenes to support counterfactual imagina...
We present a framework for building interactive, real-time, natural
lang...
Deformable objects manipulation can benefit from representations that
se...
Large language models (LLMs) trained on code completion have been shown ...
Recent works have shown how the reasoning capabilities of Large Language...
It is a long-standing problem to find effective representations for trai...
Perceptual understanding of the scene and the relationship between its
d...
Large foundation models can exhibit unique capabilities depending on the...
Action representation is an important yet often overlooked aspect in
end...
Thin, reflective objects such as forks and whisks are common in our dail...
Deformable object manipulation requires computationally efficient
repres...
We present a framework for bi-level trajectory optimization in which a
s...
We find that across a wide range of robot policy learning scenarios, tre...
We investigate the visual cross-embodiment imitation setting, in which a...
We present iNeRF, a framework that performs pose estimation by "invertin...
Rearranging and manipulating deformable objects such as cables, fabrics,...
Robotic manipulation can be formulated as inducing a sequence of spatial...
Predictive models have been at the core of many robotic systems, from
qu...