A Multi-Arm Bandit Approach To Subset Selection Under Constraints

02/09/2021
by   Ayush Deva, et al.
0

We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents' quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ, outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(ln T), where T is the total number of rounds. We further illustrate the efficacy of through simulations. To overcome the computational limitations of , we propose a polynomial-time greedy algorithm, namely , that provides an approximate solution to our ILP. We also compare the performance of and through experiments.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset