Learning-based Control of Unknown Linear Systems with Thompson Sampling

87 0 0.0 ( 0 )

Download Cite

Added by Rahul Jain

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Yi Ouyang - Mukul Gagrani - Rahul Jain

Systems and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by O(sqrt{T}). Here O(.) hides constants and logarithmic factors. This is the first O(sqrt{T} ) bound on expected regret of learning in LQ control. By introducing a reinitialization schedule, we also show that the algorithm is robust to time-varying drift in model parameters. Numerical simulations are provided to illustrate the performance of TSDE.

rate research

Sample-Based Learning Model Predictive Control for Linear Uncertain Systems

84 - Ugo Rosolia , Francesco Borrelli 2019

We present a sample-based Learning Model Predictive Controller (LMPC) for constrained uncertain linear systems subject to bounded additive disturbances. The proposed controller builds on earlier work on LMPC for deterministic systems. First, we introduce the design of the safe set and value function used to guarantee safety and performance improvement. Afterwards, we show how these quantities can be approximated using noisy historical data. The effectiveness of the proposed approach is demonstrated on a numerical example. We show that the proposed LMPC is able to safely explore the state space and to iteratively improve the worst-case closed-loop performance, while robustly satisfying state and input constraints.

Systems and Control

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

73 - Mukul Gagrani , Sagar Sudhakara , Aditya Mahajan 2021

We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $tilde{mathcal{O}}(sqrt{T})$, where $T$ is the time-horizon and the $tilde{mathcal{O}}(cdot)$ notation hides logarithmic terms in~$T$.

Systems and Control Artificial Intelligence Systems and Control

Safe Learning-Based Control of Stochastic Jump Linear Systems: a Distributionally Robust Approach

135 - Mathijs Schuurmans , Pantelis Sopasakis , Panagiotis Patrinos 2019

We consider the problem of designing control laws for stochastic jump linear systems where the disturbances are drawn randomly from a finite sample space according to an unknown distribution, which is estimated from a finite sample of i.i.d. observations. We adopt a distributionally robust approach to compute a mean-square stabilizing feedback gain with a given probability. The larger the sample size, the less conservative the controller, yet our methodology gives stability guarantees with high probability, for any number of samples. Using tools from statistical learning theory, we estimate confidence regions for the unknown probability distributions (ambiguity sets) which have the shape of total variation balls centered around the empirical distribution. We use these confidence regions in the design of appropriate distributionally robust controllers and show that the associated stability conditions can be cast as a tractable linear matrix inequality (LMI) by using conjugate duality. The resulting design procedure scales gracefully with the size of the probability space and the system dimensions. Through a numerical example, we illustrate the superior sample complexity of the proposed methodology over the stochastic approach.

Systems and Control

Simultaneous active parameter estimation and control using sampling-based Bayesian reinforcement learning

110 - Patrick Slade , Preston Culbertson , Zachary Sunberg 2017

Robots performing manipulation tasks must operate under uncertainty about both their pose and the dynamics of the system. In order to remain robust to modeling error and shifts in payload dynamics, agents must simultaneously perform estimation and control tasks. However, the optimal estimation actions are often not the optimal actions for accomplishing the control tasks, and thus agents trade between exploration and exploitation. This work frames the problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search and an extended Kalman filter to handle Gaussian process noise and parameter uncertainty in a continuous space. MCTS selects control actions to reduce model uncertainty and reach the goal state nearly optimally. Certainty equivalent model predictive control is used as a benchmark to compare performance in simulations with varying process noise and parameter uncertainty.

Systems and Control

Thompson Sampling for Contextual Bandits with Linear Payoffs

576 - Shipra Agrawal , Navin Goyal 2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studi

Machine Learning Data Structures and Algorithms Machine Learning

comments

Fetching comments

Ebla Private University

Additional details More universities

Learning-based Control of Unknown Linear Systems with Thompson Sampling

Ask ChatGPT about the research

No Arabic abstract

Read More