Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
In this work, we consider and analyze the sample complexity of model-fre...
We present ShinRL, an open-source library specialized for the evaluation...
The recent booming of entropy-regularized literature reveals that
Kullba...
In this paper, we propose cautious policy programming (CPP), a novel
val...
The oscillating performance of off-policy learning and persisting errors...