Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1+ϵ moments for some ϵ∈(0,1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of O(d^1/2T^1/1+ϵ), where d is the dimension of contextual information and T is the time horizon. Meanwhile, we provide an Ω(d^ϵ/1+ϵT^1/1+ϵ) lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of d and T when ϵ=1. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.
READ FULL TEXT