Provably Efficient Gauss-Newton Temporal Difference Learning Method with Function Approximation
In this paper, based on the spirit of Fitted Q-Iteration (FQI), we propose a Gauss-Newton Temporal Difference (GNTD) method to solve the Q-value estimation problem with function approximation. In each iteration, unlike the original FQI that solves a nonlinear least square subproblem to fit the Q-iteration, the GNTD method can be viewed as an inexact FQI that takes only one Gauss-Newton step to optimize this subproblem, which is much cheaper in computation. Compared to the popular Temporal Difference (TD) learning, which can be viewed as taking a single gradient descent step to FQI's subproblem per iteration, the Gauss-Newton step of GNTD better retains the structure of FQI and hence leads to better convergence. In our work, we derive the finite-sample non-asymptotic convergence of GNTD under linear, neural network, and general smooth function approximations. In particular, recent works on neural TD only guarantee a suboptimal 𝒪(ϵ^-4) sample complexity, while GNTD obtains an improved complexity of 𝒪̃(ϵ^-2). Finally, we validate our method via extensive experiments in both online and offline RL problems. Our method exhibits both higher rewards and faster convergence than TD-type methods, including DQN.
READ FULL TEXT