Communication-Efficient Distributed Reinforcement Learning
This paper studies the distributed reinforcement learning (DRL) problem involving a central controller and a group of learners. Two DRL settings that find broad applications are considered: multi-agent reinforcement learning (RL) and parallel RL. In both settings, frequent information exchange between the learners and the controller are required. However, for many distributed systems, e.g., parallel machines for training deep RL algorithms, and multi-robot systems for learning the optimal coordination strategies, the overhead caused by frequent communication is not negligible and becomes the bottleneck of the overall performance. To overcome this challenge, we develop a new policy gradient method that is amenable to efficient implementation in such communication-constrained settings. By adaptively skipping the policy gradient communication, our method can reduce the communication overhead without degrading the learning accuracy. Analytically, we can establish that i) the convergence rate of our algorithm is the same as the vanilla policy gradient for the DRL tasks; and, ii) if the distributed computing units are heterogeneous in terms of their reward functions and initial state distributions, the number of communication rounds needed to achieve a targeted learning accuracy is reduced. Numerical experiments on a popular multi-agent RL benchmark corroborate the significant communication reduction of our algorithm compared to the alternatives.
READ FULL TEXT