On the Convergence of Competitive, Multi-Agent Gradient-Based Learning

04/16/2018
by   Eric Mazumdar, et al.
0

As learning algorithms are increasingly deployed in markets and other competitive environments, understanding their dynamics is becoming increasingly important. We study the limiting behavior of competitive agents employing gradient-based learning algorithms. Specifically, we introduce a general framework for competitive gradient-based learning that encompasses a wide breadth of learning algorithms including policy gradient reinforcement learning, gradient based bandits, and certain online convex optimization algorithms. We show that unlike the single agent case, gradient learning schemes in competitive settings do not necessarily correspond to gradient flows and, hence, it is possible for limiting behaviors like periodic orbits to exist. We introduce a new class of games, Morse-Smale games, that correspond to gradient-like flows. We provide guarantees that competitive gradient-based learning algorithms (both in the full information and gradient-free settings) avoid linearly unstable critical points (i.e. strict saddle points and unstable limit cycles). Since generic local Nash equilibria are not unstable critical points---that is, in a formal mathematical sense, almost all Nash equilibria are not strict saddles---these results imply that gradient-based learning almost surely does not get stuck at critical points that do not correspond to Nash equilibria. For Morse-Smale games, we show that competitive gradient learning converges to linearly stable cycles (which includes stable Nash equilibria) almost surely. Finally, we specialize these results to commonly used multi-agent learning algorithms and provide illustrative examples that demonstrate the wide range of limiting behaviors competitive gradient learning exhibits.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset