ترغب بنشر مسار تعليمي؟ اضغط هنا

A Q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization

97   0   0.0 ( 0 )
 نشر من قبل Kai Du
 تاريخ النشر 2020
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper studies an infinite horizon optimal control problem for discrete-time linear systems and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. A classical approach is to solve an algebraic Riccati equation that involves mathematical expectations and requires certain statistical information of the parameters. In this paper, we propose an online iterative algorithm in the spirit of Q-learning for the situation where only one random sample of parameters emerges at each time step. The first theorem proves the equivalence of three properties: the convergence of the learning sequence, the well-posedness of the control problem, and the solvability of the algebraic Riccati equation. The second theorem shows that the adaptive feedback control in terms of the learning sequence stabilizes the system as long as the control problem is well-posed. Numerical examples are presented to illustrate our results.



قيم البحث

اقرأ أيضاً

We provide an exhaustive treatment of Linear-Quadratic control problems for a class of stochastic Volterra equations of convolution type, whose kernels are Laplace transforms of certain signed matrix measures which are not necessarily finite. These e quations are in general neither Markovian nor semimartingales, and include the fractional Brownian motion with Hurst index smaller than $1/2$ as a special case. We establish the correspondence of the initial problem with a possibly infinite dimensional Markovian one in a Banach space, which allows us to identify the Markovian controlled state variables. Using a refined martingale verification argument combined with a squares completion technique, we prove that the value function is of linear quadratic form in these state variables with a linear optimal feedback control, depending on non-standard Banach space valued Riccati equations. Furthermore, we show that the value function of the stochastic Volterra optimization problem can be approximated by that of conventional finite dimensional Markovian Linear--Quadratic problems, which is of crucial importance for numerical implementation.
121 - Shanjian Tang 2014
We are concerned with the linear-quadratic optimal stochastic control problem with random coefficients. Under suitable conditions, we prove that the value field $V(t,x,omega), (t,x,omega)in [0,T]times R^ntimes Omega$, is quadratic in $x$, and has the following form: $V(t,x)=langle K_tx, xrangle$ where $K$ is an essentially bounded nonnegative symmetric matrix-valued adapted processes. Using the dynamic programming principle (DPP), we prove that $K$ is a continuous semi-martingale of the form $$K_t=K_0+int_0^t , dk_s+sum_{i=1}^dint_0^tL_s^i, dW_s^i, quad tin [0,T]$$ with $k$ being a continuous process of bounded variation and $$Eleft[left(int_0^T|L_s|^2, dsright)^pright] <infty, quad forall pge 2; $$ and that $(K, L)$ with $L:=(L^1, cdots, L^d)$ is a solution to the associated backward stochastic Riccati equation (BSRE), whose generator is highly nonlinear in the unknown pair of processes. The uniqueness is also proved via a localized completion of squares in a self-contained manner for a general BSRE. The existence and uniqueness of adapted solution to a general BSRE was initially proposed by the French mathematician J. M. Bismut (1976, 1978). It had been solved by the author (2003) via the stochastic maximum principle with a viewpoint of stochastic flow for the associated stochastic Hamiltonian system. The present paper is its companion, and gives the {it second but more comprehensive} adapted solution to a general BSRE via the DDP. Further extensions to the jump-diffusion control system and to the general nonlinear control system are possible.
In this paper, we consider a discrete-time stochastic control problem with uncertain initial and target states. We first discuss the connection between optimal transport and stochastic control problems of this form. Next, we formulate a linear-quadra tic regulator problem where the initial and terminal states are distributed according to specified probability densities. A closed-form solution for the optimal transport map in the case of linear-time varying systems is derived, along with an algorithm for computing the optimal map. Two numerical examples pertaining to swarm deployment demonstrate the practical applicability of the model, and performance of the numerical method.
We revisit Matrix Balancing, a pre-conditioning task used ubiquitously for computing eigenvalues and matrix exponentials. Since 1960, Osbornes algorithm has been the practitioners algorithm of choice and is now implemented in most numerical software packages. However, its theoretical properties are not well understood. Here, we show that a simple random variant of Osbornes algorithm converges in near-linear time in the input sparsity. Specifically, it balances $Kinmathbb{R}_{geq 0}^{ntimes n}$ after $O(mepsilon^{-2}logkappa)$ arithmetic operations, where $m$ is the number of nonzeros in $K$, $epsilon$ is the $ell_1$ accuracy, and $kappa=sum_{ij}K_{ij}/(min_{ij:K_{ij} eq 0}K_{ij})$ measures the conditioning of $K$. Previous work had established near-linear runtimes either only for $ell_2$ accuracy (a weaker criterion which is less relevant for applications), or through an entirely different algorithm based on (currently) impractical Laplacian solvers. We further show that if the graph with adjacency matrix $K$ is moderately connected--e.g., if $K$ has at least one positive row/column pair--then Osbornes algorithm initially converges exponentially fast, yielding an improved runtime $O(mepsilon^{-1}logkappa)$. We also address numerical precision by showing that these runtime bounds still hold when using $O(log(nkappa/epsilon))$-bit numbers. Our results are established through an intuitive potential argument that leverages a convex optimization perspective of Osbornes algorithm, and relates the per-iteration progress to the current imbalance as measured in Hellinger distance. Unlike previous analyses, we critically exploit log-convexity of the potential. Our analysis extends to other variants of Osbornes algorithm: along the way, we establish significantly improved runtime bounds for cyclic, greedy, and parallelized variants.
We propose a randomized algorithm with quadratic convergence rate for convex optimization problems with a self-concordant, composite, strongly convex objective function. Our method is based on performing an approximate Newton step using a random proj ection of the Hessian. Our first contribution is to show that, at each iteration, the embedding dimension (or sketch size) can be as small as the effective dimension of the Hessian matrix. Leveraging this novel fundamental result, we design an algorithm with a sketch size proportional to the effective dimension and which exhibits a quadratic rate of convergence. This result dramatically improves on the classical linear-quadratic convergence rates of state-of-the-art sub-sampled Newton methods. However, in most practical cases, the effective dimension is not known beforehand, and this raises the question of how to pick a sketch size as small as the effective dimension while preserving a quadratic convergence rate. Our second and main contribution is thus to propose an adaptive sketch size algorithm with quadratic convergence rate and which does not require prior knowledge or estimation of the effective dimension: at each iteration, it starts with a small sketch size, and increases it until quadratic progress is achieved. Importantly, we show that the embedding dimension remains proportional to the effective dimension throughout the entire path and that our method achieves state-of-the-art computational complexity for solving convex optimization programs with a strongly convex component.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا