Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

05/31/2020
by   Tor Lattimore, et al.
0

We prove that the information-theoretic upper bound on the minimax regret for adversarial bandit convex optimisation is at most O(d^3 √(n)log(n)), improving on O(d^9.5√(n)log(n)^7.5) by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset