Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System


Abstract in English

This paper considers the data-driven linear-quadratic regulation (LQR) problem where the system parameters are unknown and need to be identified in real time. Contrary to existing system identification and data-driven control methods, which typically require either offline data collection or multiple resets, we propose an online non-episodic algorithm that gains knowledge about the system from a single trajectory. The algorithm guarantees that both the identification error and the suboptimality gap of control performance in this trajectory converge to zero almost surely. Furthermore, we characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation. We provide a numerical example to illustrate the effectiveness of our proposed strategy.

Download