Advances in Bandits with Knapsacks
"Bandits with Knapsacks" () is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for are well-understood, we focus on logarithmic instance-dependent regret bounds. We largely resolve them for one limited resource other than time, and for known, deterministic resource consumption. We also bound regret within a given round ("simple regret"). One crucial technique analyzes the sum of the confidence terms of the chosen arms. This technique allows to import the insights from prior work on bandits without resources, which leads to several extensions.
READ FULL TEXT