A Multi-Arm Bandit Approach To Subset Selection Under Constraints
We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents' quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ, outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(ln T), where T is the total number of rounds. We further illustrate the efficacy of through simulations. To overcome the computational limitations of , we propose a polynomial-time greedy algorithm, namely , that provides an approximate solution to our ILP. We also compare the performance of and through experiments.
READ FULL TEXT