Lipschitz Continuity in Model-based Reinforcement Learning
Model-based reinforcement-learning methods learn transition and reward models and use them to guide behavior. We analyze the impact of learning models that are Lipschitz continuous---the distance between function values for two inputs is bounded by a linear function of the distance between the inputs. Our first result shows a tight bound on model errors for multi-step predictions with Lipschitz continuous models. We go on to prove an error bound for the value-function estimate arising from such models and show that the estimated value function is itself Lipschitz continuous. We conclude with empirical results that demonstrate significant benefits to enforcing Lipschitz continuity of neural net models during reinforcement learning.
READ FULL TEXT