Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

695 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Adel Javanmard

تاريخ النشر 2013

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Morteza Ibrahimi - Adel Javanmard - Benjamin Van Roy

التعلم الالي التعلم الآلي التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of ${O}(sqrt{T})$ was shown, apart form logarithmic factors. However, this bound scales exponentially with $p$, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of ${O}(p sqrt{T})$, apart from logarithmic factors. In particular, our algorithm has an average cost of $(1+eps)$ times the optimum cost after $T = polylog(p) O(1/eps^2)$. This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of $eps$ times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks.

قيم البحث

112 - Shi Dong , Benjamin Van Roy , Zhengyuan Zhou 2019

We establish that an optimistic variant of Q-learning applied to a fixed-horizon episodic Markov decision process with an aggregated state representation incurs regret $tilde{mathcal{O}}(sqrt{H^5 M K} + epsilon HK)$, where $H$ is the horizon, $M$ is the number of aggregate states, $K$ is the number of episodes, and $epsilon$ is the largest difference between any pair of optimal state-action values associated with a common aggregate state. Notably, this regret bound does not depend on the number of states or actions and indicates that asymptotic per-period regret is no greater than $epsilon$, independent of horizon. To our knowledge, this is the first such result that applies to reinforcement learning with nontrivial value function approximation without any restrictions on transition probabilities.

التعلم الالي التعلم الآلي التحسين والتحكم

Transfer Learning under High-dimensional Generalized Linear Models

101 - Ye Tian , Yang Feng 2021

In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose an oracle algorithm and derive its $ell_2$-estimation error bounds. The theoretical analysis shows that under certain conditions, when the target and source are sufficiently close to each other, the estimation error bound could be improved over that of the classical penalized estimator using only target data. When we dont know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms.

التعلم الالي التعلم الآلي المنهجية

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

83 - Jason Ge , Xingguo Li , Haoming Jiang 2020

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and sca led sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex $ell_1$, nonconvex MCP and SCAD regularizers. The library is coded in C++ and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently.

التعلم الالي التعلم الآلي التحسين والتحكم

Efficient Algorithms for Estimating the Parameters of Mixed Linear Regression Models

514 - Babak Barazandeh , Ali Ghafelebashi , Meisam Razaviyayn 2021

Mixed linear regression (MLR) model is among the most exemplary statistical tools for modeling non-linear distributions using a mixture of linear models. When the additive noise in MLR model is Gaussian, Expectation-Maximization (EM) algorithm is a w idely-used algorithm for maximum likelihood estimation of MLR parameters. However, when noise is non-Gaussian, the steps of EM algorithm may not have closed-form update rules, which makes EM algorithm impractical. In this work, we study the maximum likelihood estimation of the parameters of MLR model when the additive noise has non-Gaussian distribution. In particular, we consider the case that noise has Laplacian distribution and we first show that unlike the the Gaussian case, the resulting sub-problems of EM algorithm in this case does not have closed-form update rule, thus preventing us from using EM in this case. To overcome this issue, we propose a new algorithm based on combining the alternating direction method of multipliers (ADMM) with EM algorithm idea. Our numerical experiments show that our method outperforms the EM algorithm in statistical accuracy and computational time in non-Gaussian noise case.

التعلم الالي التعلم الآلي التحسين والتحكم

Efficient Learning of Distributed Linear-Quadratic Controllers

346 - Salar Fattahi , Nikolai Matni , Somayeh Sojoudi 2019

In this work, we propose a robust approach to design distributed controllers for unknown-but-sparse linear and time-invariant systems. By leveraging modern techniques in distributed controller synthesis and structured linear inverse problems as appli ed to system identification, we show that near-optimal distributed controllers can be learned with sub-linear sample complexity and computed with near-linear time complexity, both measured with respect to the dimension of the system. In particular, we provide sharp end-to-end guarantees on the stability and the performance of the designed distributed controller and prove that for sparse systems, the number of samples needed to guarantee robust and near optimal performance of the designed controller can be significantly smaller than the dimension of the system. Finally, we show that the proposed optimization problem can be solved to global optimality with near-linear time complexity by iteratively solving a series of small quadratic programs.

التحسين والتحكم التعلم الآلي التعلم الالي