The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

06/04/2019

∙

We study the behavior of stochastic bandits algorithms under strategic behavior conducted by rational actors, i.e., the arms. Each arm is a strategic player who can modify its own reward whenever pulled, subject to a cross-period budget constraint. Each arm is self-interested and seeks to maximize its own expected number of times of being pulled over a decision horizon. Strategic manipulations naturally arise in various economic applications, e.g., recommendation systems such as Yelp and Amazon. We analyze the robustness of three popular bandit algorithms: UCB, ε-Greedy, and Thompson Sampling. We prove that all three algorithms achieve a regret upper bound O({ B, T}) under any (possibly adaptive) strategy of the strategic arms, where B is the total budget across arms. Moreover, we prove that our regret upper bound is tight. Our results illustrate the intrinsic robustness of bandits algorithms against strategic manipulation so long as B=o(T). This is in sharp contrast to the more pessimistic model of adversarial attacks where an attack budget of O( T) can trick UCB and ε-Greedy to pull the optimal arm only o(T) number of times. Our results hold for both bounded and unbounded rewards.

READ FULL TEXT

The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

Sign in with Google

Consider DeepAI Pro