Model-free Reinforcement Learning for Robust Locomotion Using Trajectory Optimization for Exploration

07/14/2021
by   Miroslav Bogdanovic, et al.
1

In this work we present a general, two-stage reinforcement learning approach for going from a single demonstration trajectory to a robust policy that can be deployed on hardware without any additional training. The demonstration is used in the first stage as a starting point to facilitate initial exploration. In the second stage, the relevant task reward is optimized directly and a policy robust to environment uncertainties is computed. We demonstrate and examine in detail performance and robustness of our approach on highly dynamic hopping and bounding tasks on a real quadruped robot.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset