Best-of-Both-Worlds Algorithms for Partial Monitoring
This paper considers the partial monitoring problem with k-actions and d-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are bounded poly-logarithmically in the stochastic regime and near-optimally in the adversarial regime. To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by O(k^3 m^2 log(T) log(k_Π T) / Δ_min) and in the adversarial regime by O(k^2/3 m √(T log(T) log k_Π)), where T is the number of rounds, m is the maximum number of distinct observations per action, Δ_min is the minimum optimality gap, and k_Π is the number of Pareto optimal actions. Moreover, we show that for non-degenerate globally observable games, the regret in the stochastic regime is bounded by O(max{c_𝒢^2 / k, c_𝒢}log(T) log(k_Π T) / Δ_min^2) and in the adversarial regime by O((max{c_𝒢^2 / k, c_𝒢}log(T) log(k_Π T)))^1/3 T^2/3), where c_𝒢 is a game-dependent constant. Our algorithms are based on the follow-the-regularized-leader framework that takes into account the nature of the partial monitoring problem, inspired by algorithms in the field of online learning with feedback graphs.
READ FULL TEXT