A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only asymptotically optimal: we also prove non-asymptotic bounds occurring with high probability. To the best of our knowledge, this is the first strategy with such guarantees. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is slightly biased in favor of exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.
READ FULL TEXT