On the connection between Bregman divergence and value in regularized Markov decision processes

10/21/2022
by   Brendan O'Donoghue, et al.
0

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset