No Arabic abstract
In this work, we study the problem of learning partially observed linear dynamical systems from a single sample trajectory. A major practical challenge in the existing system identification methods is the undesirable dependency of their required sample size on the system dimension: roughly speaking, they presume and rely on sample sizes that scale linearly with respect to the system dimension. Evidently, in high-dimensional regime where the system dimension is large, it may be costly, if not impossible, to collect as many samples from the unknown system. In this paper, we will remedy this undesirable dependency on the system dimension by introducing an $ell_1$-regularized estimation method that can accurately estimate the Markov parameters of the system, provided that the number of samples scale logarithmically with the system dimension. Our result significantly improves the sample complexity of learning partially observed linear dynamical systems: it shows that the Markov parameters of the system can be learned in the high-dimensional setting, where the number of samples is significantly smaller than the system dimension. Traditionally, the $ell_1$-regularized estimators have been used to promote sparsity in the estimated parameters. By resorting to the notion of weak sparsity, we show that, irrespective of the true sparsity of the system, a similar regularized estimator can be used to reduce the sample complexity of learning partially observed linear systems, provided that the true system is inherently stable.
We propose a theoretical framework for approximate planning and learning in partially observed systems. Our framework is based on the fundamental notion of information state. We provide two equivalent definitions of information state -- i) a function of history which is sufficient to compute the expected reward and predict its next value; ii) equivalently, a function of the history which can be recursively updated and is sufficient to compute the expected reward and predict the next observation. An information state always leads to a dynamic programming decomposition. Our key result is to show that if a function of the history (called approximate information state (AIS)) approximately satisfies the properties of the information state, then there is a corresponding approximate dynamic program. We show that the policy computed using this is approximately optimal with bounded loss of optimality. We show that several approximations in state, observation and action spaces in literature can be viewed as instances of AIS. In some of these cases, we obtain tighter bounds. A salient feature of AIS is that it can be learnt from data. We present AIS based multi-time scale policy gradient algorithms. and detailed numerical experiments with low, moderate and high dimensional environments.
Attempts from different disciplines to provide a fundamental understanding of deep learning have advanced rapidly in recent years, yet a unified framework remains relatively limited. In this article, we provide one possible way to align existing branches of deep learning theory through the lens of dynamical system and optimal control. By viewing deep neural networks as discrete-time nonlinear dynamical systems, we can analyze how information propagates through layers using mean field theory. When optimization algorithms are further recast as controllers, the ultimate goal of training processes can be formulated as an optimal control problem. In addition, we can reveal convergence and generalization properties by studying the stochastic dynamics of optimization algorithms. This viewpoint features a wide range of theoretical study from information bottleneck to statistical physics. It also provides a principled way for hyper-parameter tuning when optimal control theory is introduced. Our framework fits nicely with supervised learning and can be extended to other learning problems, such as Bayesian learning, adversarial training, and specific forms of meta learning, without efforts. The review aims to shed lights on the importance of dynamics and optimal control when developing deep learning theory.
We study safe, data-driven control of (Markov) jump linear systems with unknown transition probabilities, where both the discrete mode and the continuous state are to be inferred from output measurements. To this end, we develop a receding horizon estimator which uniquely identifies a sub-sequence of past mode transitions and the corresponding continuous state, allowing for arbitrary switching behavior. Unlike traditional approaches to mode estimation, we do not require an offline exhaustive search over mode sequences to determine the size of the observation window, but rather select it online. If the system is weakly mode observable, the window size will be upper bounded, leading to a finite-memory observer. We integrate the estimation procedure with a simple distributionally robust controller, which hedges against misestimations of the transition probabilities due to finite sample sizes. As additional mode transitions are observed, the used ambiguity sets are updated, resulting in continual improvements of the control performance. The practical applicability of the approach is illustrated on small numerical examples.
When training the parameters of a linear dynamical model, the gradient descent algorithm is likely to fail to converge if the squared-error loss is used as the training loss function. Restricting the parameter space to a smaller subset and running the gradient descent algorithm within this subset can allow learning stable dynamical systems, but this strategy does not work for unstable systems. In this work, we look into the dynamics of the gradient descent algorithm and pinpoint what causes the difficulty of learning unstable systems. We show that observations taken at different times from the system to be learned influence the dynamics of the gradient descent algorithm in substantially different degrees. We introduce a time-weighted logarithmic loss function to fix this imbalance and demonstrate its effectiveness in learning unstable systems.
We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust). This problem has been studied from different perspectives by different communities. However, the existing theory deals only with the case of quadratic costs (the LQ problem), which limits applications to stabilisation and tracking tasks only. In order to handle more general (non-convex) costs that naturally arise in many practical problems, we carefully select and bring together several tools from different communities, namely non-asymptotic linear regression, recent results in interval prediction, and tree-based planning. Combining and adapting the theoretical guarantees at each layer is non trivial, and we provide the first end-to-end suboptimality analysis for this setting. Interestingly, our analysis naturally adapts to handle many models and combines with a data-driven robust model selection strategy, which enables to relax the modelling assumptions. Last, we strive to preserve tractability at any stage of the method, that we illustrate on two challenging simulated environments.