Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games

08/15/2023
by   Zailin Ma, et al.
0

Computing approximate Nash equilibria in multi-player general-sum Markov games is a computationally intractable task. However, multi-player Markov games with certain cooperative or competitive structures might circumvent this intractability. In this paper, we focus on multi-player zero-sum polymatrix Markov games, where players interact in a pairwise fashion while remain overall competitive. To the best of our knowledge, we propose the first policy optimization algorithm called Entropy-Regularized Optimistic-Multiplicative-Weights-Update (ER-OMWU) for finding approximate Nash equilibria in finite-horizon zero-sum polymatrix Markov games with full information feedback. We provide last-iterate convergence guarantees for finding an ϵ-approximate Nash equilibrium within Õ(1/ϵ) iterations, which is near-optimal compared to the optimal O(1/ϵ) iteration complexity in two-player zero-sum Markov games, which is a degenerate case of zero-sum polymatrix games with only two players involved. Our algorithm combines the regularized and optimistic learning dynamics with separated smooth value update within a single loop, where players update strategies in a symmetric and almost uncoupled manner. It provides a natural dynamics for finding equilibria and is more probable to be adapted to a sample-efficient and fully decentralized implementation where only partial information feedback is available in the future.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset