Optimal Fair Multi-Agent Bandits

06/07/2023
by   Amir Leshem, et al.
0

In this paper, we study the problem of fair multi-agent multi-arm bandit learning when agents do not communicate with each other, except collision information, provided to agents accessing the same arm simultaneously. We provide an algorithm with regret O(N^3 log N log T ) (assuming bounded rewards, with unknown bound). This significantly improves previous results which had regret of order O(log T loglog T) and exponential dependence on the number of agents. The result is attained by using a distributed auction algorithm to learn the sample-optimal matching, a new type of exploitation phase whose length is derived from the observed samples, and a novel order-statistics-based regret analysis. Simulation results present the dependence of the regret on log T.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset