Best-Arm Identification in Correlated Multi-Armed Bandits
In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability 1-δ for some δ>0, the arm with the highest mean reward in minimum possible samples from the set of arms 𝒦. Most existing best-arm identification algorithms and analyses operate under the assumption that the rewards corresponding to different arms are independent of each other. We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm. Our proposed algorithm C-LUCB, which generalizes the LUCB algorithm utilizes this partial knowledge of correlations to sharply reduce the sample complexity of best-arm identification. More interestingly, we show that the total samples obtained by C-LUCB are of the form 𝒪(∑_k ∈𝒞log(1/δ)) as opposed to the typical 𝒪(∑_k ∈𝒦log(1/δ)) samples required in the independent reward setting. The improvement comes, as the 𝒪(log(1/δ)) term is summed only for the set of competitive arms 𝒞, which is a subset of the original set of arms 𝒦. The size of the set 𝒞, depending on the problem setting, can be as small as 2, and hence using C-LUCB in the correlated bandits setting can lead to significant performance improvements. Our theoretical findings are supported by experiments on the Movielens and Goodreads recommendation datasets.
READ FULL TEXT