Adversarial Bandits Robust to S-Switch Regret

05/30/2022
by   Jung-Hun Kim, et al.
0

We study the adversarial bandit problem under S number of switching best arms for unknown S. For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving Õ(S^1/2K^1/3T^2/3). For improving the regret bound with respect to T, we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve Õ(min{𝔼[√(SKTρ_T(h^†))],S√(KT)}), where ρ_T(h^†) is a variance term for loss estimators.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro