Adversarial Bandits Robust to S-Switch Regret
We study the adversarial bandit problem under S number of switching best arms for unknown S. For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving Õ(S^1/2K^1/3T^2/3). For improving the regret bound with respect to T, we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve Õ(min{𝔼[√(SKTρ_T(h^†))],S√(KT)}), where ρ_T(h^†) is a variance term for loss estimators.
READ FULL TEXT