We study the convergence behavior of the celebrated temporal-difference ...
Decision-focused (DF) model-based reinforcement learning has recently be...
Advances in reinforcement learning have led to its successful applicatio...
In the reinforcement learning literature, there are many algorithms deve...
We employ Proximal Iteration for value-function optimization in reinforc...
Off-policy policy evaluation methods for sequential decision making can ...
Principled decision-making in continuous state–action spaces is impossib...
Importance sampling-based estimators for off-policy evaluation (OPE) are...
The fundamental assumption of reinforcement learning in Markov decision
...
Finding an effective medical treatment often requires a search by trial ...
Off-policy evaluation in reinforcement learning offers the chance of usi...
Tensor decomposition methods allow us to learn the parameters of latent
...
We consider a model-based approach to perform batch off-policy evaluatio...
Sepsis is the leading cause of mortality in the ICU. It is challenging t...
In this work, we consider the problem of estimating a behaviour policy f...
Much attention has been devoted recently to the development of machine
l...
We study the problem of off-policy policy evaluation (OPPE) in RL. In
co...
Tensor decomposition methods are popular tools for learning latent varia...