Smoothed functional-based gradient algorithms for off-policy reinforcement learning

01/06/2021
by   Nithia Vijayan, et al.
0

We consider the problem of control in an off-policy reinforcement learning (RL) context. We propose a policy gradient scheme that incorporates a smoothed functional-based gradient estimation scheme. We provide an asymptotic convergence guarantee for the proposed algorithm using the ordinary differential equation (ODE) approach. Further, we derive a non-asymptotic bound that quantifies the rate of convergence of the proposed algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset