We provide non-asymptotic bounds for the well-known temporal difference
...
We tackle the problem of online reward maximisation over a large finite ...
Thompson Sampling has been demonstrated in many complex bandit models,
h...
Online learning algorithms require to often recompute least squares
regr...
We propose a stochastic approximation (SA) based method with randomizati...
The question of the optimality of Thompson Sampling for solving the
stoc...