Markov Game with Switching Costs

07/13/2021
by   Jian Li, et al.
0

We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least k chains to reach their target states. If the player decides to play a different chain, an additional switching cost is incurred. The special case in which there is no switching cost was solved optimally by Dumitriu, Tetali, and Winkler [DTW03] by a variant of the celebrated Gittins Index for the classical multi-armed bandit (MAB) problem with Markovian rewards [Gittins 74, Gittins79]. However, for multi-armed bandit (MAB) with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram [BS94] showed that no index strategy can be optimal. In this paper, we complement their result and show there is a simple index strategy that achieves a constant approximation factor if the switching cost is constant and k=1. To the best of our knowledge, this is the first index strategy that achieves a constant approximation factor for a general MAB variant with switching costs. For the general metric, we propose a more involved constant-factor approximation algorithm, via a nontrivial reduction to the stochastic k-TSP problem, in which a Markov chain is approximated by a random variable. Our analysis makes extensive use of various interesting properties of the Gittins index.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset