Learning-based Control of Unknown Linear Systems with Thompson Sampling

87 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rahul Jain

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yi Ouyang - Mukul Gagrani - Rahul Jain

أنظمة وتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by O(sqrt{T}). Here O(.) hides constants and logarithmic factors. This is the first O(sqrt{T} ) bound on expected regret of learning in LQ control. By introducing a reinitialization schedule, we also show that the algorithm is robust to time-varying drift in model parameters. Numerical simulations are provided to illustrate the performance of TSDE.

قيم البحث

84 - Ugo Rosolia , Francesco Borrelli 2019

We present a sample-based Learning Model Predictive Controller (LMPC) for constrained uncertain linear systems subject to bounded additive disturbances. The proposed controller builds on earlier work on LMPC for deterministic systems. First, we intro duce the design of the safe set and value function used to guarantee safety and performance improvement. Afterwards, we show how these quantities can be approximated using noisy historical data. The effectiveness of the proposed approach is demonstrated on a numerical example. We show that the proposed LMPC is able to safely explore the state space and to iteratively improve the worst-case closed-loop performance, while robustly satisfying state and input constraints.

أنظمة وتحكم

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

73 - Mukul Gagrani , Sagar Sudhakara , Aditya Mahajan 2021

We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $tilde{mathcal{O}}(sqrt{T})$, where $T$ is the time-horizon and the $tilde{mathcal{O}}(cdot)$ notation hides logarithmic terms in~$T$.

أنظمة وتحكم الذكاء الاصطناعي أنظمة وتحكم

Safe Learning-Based Control of Stochastic Jump Linear Systems: a Distributionally Robust Approach

135 - Mathijs Schuurmans , Pantelis Sopasakis , Panagiotis Patrinos 2019

We consider the problem of designing control laws for stochastic jump linear systems where the disturbances are drawn randomly from a finite sample space according to an unknown distribution, which is estimated from a finite sample of i.i.d. observat ions. We adopt a distributionally robust approach to compute a mean-square stabilizing feedback gain with a given probability. The larger the sample size, the less conservative the controller, yet our methodology gives stability guarantees with high probability, for any number of samples. Using tools from statistical learning theory, we estimate confidence regions for the unknown probability distributions (ambiguity sets) which have the shape of total variation balls centered around the empirical distribution. We use these confidence regions in the design of appropriate distributionally robust controllers and show that the associated stability conditions can be cast as a tractable linear matrix inequality (LMI) by using conjugate duality. The resulting design procedure scales gracefully with the size of the probability space and the system dimensions. Through a numerical example, we illustrate the superior sample complexity of the proposed methodology over the stochastic approach.

أنظمة وتحكم

Simultaneous active parameter estimation and control using sampling-based Bayesian reinforcement learning

110 - Patrick Slade , Preston Culbertson , Zachary Sunberg 2017

Robots performing manipulation tasks must operate under uncertainty about both their pose and the dynamics of the system. In order to remain robust to modeling error and shifts in payload dynamics, agents must simultaneously perform estimation and co ntrol tasks. However, the optimal estimation actions are often not the optimal actions for accomplishing the control tasks, and thus agents trade between exploration and exploitation. This work frames the problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search and an extended Kalman filter to handle Gaussian process noise and parameter uncertainty in a continuous space. MCTS selects control actions to reduce model uncertainty and reach the goal state nearly optimally. Certainty equivalent model predictive control is used as a benchmark to compare performance in simulations with varying process noise and parameter uncertainty.

أنظمة وتحكم

Thompson Sampling for Contextual Bandits with Linear Payoffs

212 - Shipra Agrawal , Navin Goyal 2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical pe rformance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studi

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني لإدارة الأعمال

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning-based Control of Unknown Linear Systems with Thompson Sampling

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً