Autoregressive Bandits

12/12/2022
by   Francesco Bacchiocchi, et al.
0

Autoregressive processes naturally arise in a large variety of real-world scenarios, including e.g., stock markets, sell forecasting, weather prediction, advertising, and pricing. When addressing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for converge to the optimal decision policy. In this work, we propose a novel online learning setting, named Autoregressive Bandits (ARBs), in which the observed reward follows an autoregressive process of order k, whose parameters depend on the action the agent chooses, within a finite set of n actions. Then, we devise an optimistic regret minimization algorithm AutoRegressive Upper Confidence Bounds (AR-UCB) that suffers regret of order 𝒪( (k+1)^3/2√(nT)/(1-Γ)^2), being T the optimization horizon and Γ < 1 an index of the stability of the system. Finally, we present a numerical validation in several synthetic and one real-world setting, in comparison with general and specific purpose bandit baselines showing the advantages of the proposed approach.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset