Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access
In this paper, we study the uncoordinated spectrum access problem using the multi-player multi-armed bandits framework. We consider a model where there is no central control and the users cannot communicate with each other. The environment may appear differently to different users, i.e., the mean rewards as seen by different users for a particular channel may be different. Additionally, in case of a collision, we allow for the colliding users to receive non-zero rewards. With this setup, we present a policy that achieves expected regret of order O(log^2+δT) for some δ > 0.
READ FULL TEXT