Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling (TS) algorithm, using special classes of sparsity-inducing priors (e.g. spike-and-slab) to model the unknown parameter, and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits. For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC to approximate the posterior distribution. Extensive simulations demonstrate improved performance of our proposed algorithm over existing ones.
READ FULL TEXT