Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

02/17/2020
by   Simon S. Du, et al.
0

The current paper studies the problem of agnostic Q-learning with function approximation in deterministic systems where the optimal Q-function is approximable by a function in the class F with approximation error δ> 0. We propose a novel recursion-based algorithm and show that if δ = O(ρ/√(_E)), then one can find the optimal policy using O(_E) trajectories, where ρ is the gap between the optimal Q-value of the best actions and that of the second-best actions and _E is the Eluder dimension of F. Our result has two implications: 1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition δ = Θ(ρ/√(dim_E)) is necessary and sufficient for algorithms with polynomial sample complexity. 2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity Θ(dim_E) is tight even in the agnostic setting. Therefore, we settle the open problem on agnostic Q-learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro