Regret Minimization in Partially Observable Linear Quadratic Control

01/31/2020
by   Sahin Lale, et al.
13

We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of Õ(T^2/3) for ExpCommit, where T is the time horizon of the problem.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset