Sample Complexity of Incentivized Exploration
We consider incentivized exploration: a version of multi-armed bandits where the choice of actions is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work matches the optimal regret rates for bandits up to "constant" multiplicative factors determined by the Bayesian prior. However, the dependence on the prior in prior work could be arbitrarily large, and the dependence on the number of arms K could be exponential. The optimal dependence on the prior and K is very unclear. We make progress on these issues. Our first result is that Thompson sampling is incentive-compatible if initialized with enough data points. Thus, we reduce the problem of designing incentive-compatible algorithms to that of sample complexity: (i) How many data points are needed to incentivize Thompson sampling? (ii) How many rounds does it take to collect these samples? We address both questions, providing upper bounds on sample complexity that are typically polynomial in K and lower bounds that are polynomially matching.
READ FULL TEXT