Intrinsically-Motivated Goal-Conditioned Reinforcement Learning in Multi-Agent Environments
How can a population of reinforcement learning agents autonomously learn a diversity of cooperative tasks in a shared environment? In the single-agent paradigm, goal-conditioned policies have been combined with intrinsic motivation mechanisms to endow agents with the ability to master a wide diversity of autonomously discovered goals. Transferring this idea to cooperative multi-agent systems (MAS) entails a challenge: intrinsically motivated agents that sample goals independently focus on a shared cooperative goal with low probability, impairing their learning performance. In this work, we propose a new learning paradigm for modeling such settings, the Decentralized Intrinsically Motivated Skill Acquisition Problem (Dec-IMSAP), and employ it to solve cooperative navigation tasks. Agents in a Dec-IMSAP are trained in a fully decentralized way, which comes in contrast to previous contributions in multi-goal MAS that consider a centralized goal-selection mechanism. Our empirical analysis indicates that a sufficient condition for efficiently learning a diversity of cooperative tasks is to ensure that a group aligns its goals, i.e., the agents pursue the same cooperative goal and learn to coordinate their actions through specialization. We introduce the Goal-coordination game, a fully-decentralized emergent communication algorithm, where goal alignment emerges from the maximization of individual rewards in multi-goal cooperative environments and show that it is able to reach equal performance to a centralized training baseline that guarantees aligned goals. To our knowledge, this is the first contribution addressing the problem of intrinsically motivated multi-agent goal exploration in a decentralized training paradigm.
READ FULL TEXT