Policy Search in Continuous Action Domains: an Overview
Continuous action policy search, the search for efficient policies in continuous control tasks, is currently the focus of intensive research driven both by the recent success of deep reinforcement learning algorithms and by the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, incorporating into a common big picture these very different approaches as well as alternatives such as Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches. Besides, to keep this survey as short and didactic as possible, we do not go into the details of mathematical derivations of the elementary algorithms.
READ FULL TEXT