Optimal Dynamic Regret in LQR Control

06/18/2022
βˆ™
by   Dheeraj Baby, et al.
βˆ™
0
βˆ™

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of Γ•(max{n^1/3𝒯𝒱(M_1:n)^2/3, 1}), where 𝒯𝒱(M_1:n) is the total variation of any oracle sequence of Disturbance Action policies parameterized by M_1,...,M_n – chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of Γ•(√(n (𝒯𝒱(M_1:n)+1)) ) for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal Γ•(n^1/3) dynamic regret on a family of β€œminibatched” quadratic losses, which could be of independent interest.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset