We consider a contextual bandit problem with S contexts and A actions.
I...
Regret Matching+ (RM+) and its variants are important algorithms for sol...
For tabular data sets, we explore data and model distillation, as well a...
A recent paper by Piliouras et al. [2021, 2022] introduces an uncoupled
...
A recent line of work has established uncoupled learning dynamics such t...
In this paper we establish efficient and uncoupled learning dynamics
so ...
While extensive-form games (EFGs) can be converted into normal-form game...
Policy optimization is a widely-used method in reinforcement learning. D...
Regret-based algorithms are highly efficient at finding approximate Nash...
In this work, we develop linear bandit algorithms that automatically ada...
We study infinite-horizon discounted two-player zero-sum Markov games, a...
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddle-point
opt...
We develop a new approach to obtaining high probability regret bounds fo...
We study small-loss bounds for the adversarial multi-armed bandits probl...
We propose the first contextual bandit algorithm that is parameter-free,...
In this paper, we propose a novel deep learning architecture for multi-l...