Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models
In this paper, we study sequential testing problems with overlapping hypotheses. We first focus on the simple problem of assessing if the mean μ of a Gaussian distribution is ≥ -ϵ or ≤ϵ; if μ∈(-ϵ,ϵ), both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given K probability distributions on R with means μ_1,...,μ_K, we derive the asymptotic complexity of identifying, with risk at most δ, an index I∈{1,...,K} such that μ_I≥_iμ_i -ϵ. We provide non asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on two versions of a change of measure lemma (a high-level form, and a low-level form) whose relative merits are discussed.
READ FULL TEXT