Improving Sample Complexity Bounds for Actor-Critic Algorithms

04/27/2020
by   Tengyu Xu, et al.
8

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. The finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an ϵ-accurate stationary point improves the best known sample complexity of AC by an order of O(1/ϵlog(1/ϵ)). We also show that the overall sample complexity for a mini-batch NAC to attain an ϵ-accurate globally optimal point improves the known sample complexity of natural policy gradient (NPG) by O(1/ϵ/log(1/ϵ)). Our study develops several novel techniques for finite-sample analysis of RL algorithms including handling the bias error due to mini-batch Markovian sampling and exploiting the self variance reduction property to improve the convergence analysis of NAC.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset