Chat Image Generator Video Music Voice Chat Photo Editor

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

06/06/2019

∙

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

READ FULL TEXT

Success!

An error occurred

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Sign in with Google

Consider DeepAI Pro