An Arm-wise Randomization Approach to Combinatorial Linear Semi-bandits
Combinatorial linear semi-bandits (CLS) are widely applicable frameworks of sequential decision-making, in which a learner chooses a subset of arms from a given set of arms associated with feature vectors. For this problem, existing algorithms work poorly for the clustered case in which the feature vectors form many large clusters. This is a critical shortcoming in practice because such situations can be found in many applications including recommender systems. In this paper, we clarify the cause of why such a shortcoming appears, and to overcome this, we introduce a key technique of arm-wise randomization. We propose two algorithms with this technique: the perturbed C^2UCB (PC^2UCB) and the Thompson sampling (TS). Our empirical evaluation with artificial and real-world datasets demonstrates that the proposed algorithms with the arm-wise randomization technique outperform the existing algorithms without this technique, especially for the clustered case. Our contributions also include theoretical analyses that provide high probability asymptotic regret bounds for our algorithms.
READ FULL TEXT