Knowledge distillation is commonly used for compressing neural networks ...
Despite the seeming success of contemporary grounded text generation sys...
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
In this work, we consider and analyze the sample complexity of model-fre...
The Q-function is a central quantity in many Reinforcement Learning (RL)...
Offline Reinforcement Learning (RL) aims at learning an optimal control ...
Offline Reinforcement Learning methods seek to learn a policy from logge...
Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most
a...
Building upon the formalism of regularized Markov decision processes, we...
We adapt the optimization's concept of momentum to reinforcement learnin...
Dynamic Programming (DP) provides standard algorithms to solve Markov
De...
Conservative Policy Iteration (CPI) is a founding algorithm of Approxima...