Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

117 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yujie Tang

تاريخ النشر 2019

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Yingying Li - Yujie Tang - Runyu Zhang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerances inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on multi-zone HVAC systems.

قيم البحث

224 - Qingrui Zhang , Hao Dong , Wei Pan 2020

Decentralized multi-agent control has broad applications, ranging from multi-robot cooperation to distributed sensor networks. In decentralized multi-agent control, systems are complex with unknown or highly uncertain dynamics, where traditional mode l-based control methods can hardly be applied. Compared with model-based control in control theory, deep reinforcement learning (DRL) is promising to learn the controller/policy from data without the knowing system dynamics. However, to directly apply DRL to decentralized multi-agent control is challenging, as interactions among agents make the learning environment non-stationary. More importantly, the existing multi-agent reinforcement learning (MARL) algorithms cannot ensure the closed-loop stability of a multi-agent system from a control-theoretic perspective, so the learned control polices are highly possible to generate abnormal or dangerous behaviors in real applications. Hence, without stability guarantee, the application of the existing MARL algorithms to real multi-agent systems is of great concern, e.g., UAVs, robots, and power systems, etc. In this paper, we aim to propose a new MARL algorithm for decentralized multi-agent control with a stability guarantee. The new MARL algorithm, termed as a multi-agent soft-actor critic (MASAC), is proposed under the well-known framework of centralized-training-with-decentralized-execution. The closed-loop stability is guaranteed by the introduction of a stability constraint during the policy improvement in our MASAC algorithm. The stability constraint is designed based on Lyapunovs method in control theory. To demonstrate the effectiveness, we present a multi-agent navigation example to show the efficiency of the proposed MASAC algorithm.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator

205 - Feiran Zhao , Keyou You 2020

Risk-aware control, though with promise to tackle unexpected events, requires a known exact dynamical model. In this work, we propose a model-free framework to learn a risk-aware controller with a focus on the linear system. We formulate it as a disc rete-time infinite-horizon LQR problem with a state predictive variance constraint. To solve it, we parameterize the policy with a feedback gain pair and leverage primal-dual methods to optimize it by solely using data. We first study the optimization landscape of the Lagrangian function and establish the strong duality in spite of its non-convex nature. Alongside, we find that the Lagrangian function enjoys an important local gradient dominance property, which is then exploited to develop a convergent random search algorithm to learn the dual function. Furthermore, we propose a primal-dual algorithm with global convergence to learn the optimal policy-multiplier pair. Finally, we validate our results via simulations.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

Graph Neural Networks for Distributed Linear-Quadratic Control

88 - Fernando Gama , Somayeh Sojoudi 2020

The linear-quadratic controller is one of the fundamental problems in control theory. The optimal solution is a linear controller that requires access to the state of the entire system at any given time. When considering a network system, this render s the optimal controller a centralized one. The interconnected nature of a network system often demands a distributed controller, where different components of the system are controlled based only on local information. Unlike the classical centralized case, obtaining the optimal distributed controller is usually an intractable problem. Thus, we adopt a graph neural network (GNN) as a parametrization of distributed controllers. GNNs are naturally local and have distributed architectures, making them well suited for learning nonlinear distributed controllers. By casting the linear-quadratic problem as a self-supervised learning problem, we are able to find the best GNN-based distributed controller. We also derive sufficient conditions for the resulting closed-loop system to be stable. We run extensive simulations to study the performance of GNN-based distributed controllers and showcase that they are a computationally efficient parametrization with scalability and transferability capabilities.

أنظمة وتحكم أنظمة وتحكم معالجة الإشارات

Scalable Synthesis of Minimum-Information Linear-Gaussian Control by Distributed Optimization

119 - Murat Cubuktepe , Takashi Tanaka , 2020

We consider a discrete-time linear-quadratic Gaussian control problem in which we minimize a weighted sum of the directed information from the state of the system to the control input and the control cost. The optimal control and sensing policies can be synthesized jointly by solving a semidefinite programming problem. However, the existing solutions typically scale cubic with the horizon length. We leverage the structure in the problem to develop a distributed algorithm that decomposes the synthesis problem into a set of smaller problems, one for each time step. We prove that the algorithm runs in time linear in the horizon length. As an application of the algorithm, we consider a path-planning problem in a state space with obstacles under the presence of stochastic disturbances. The algorithm computes a locally optimal solution that jointly minimizes the perception and control cost while ensuring the safety of the path. The numerical examples show that the algorithm can scale to thousands of horizon length and compute locally optimal solutions.

أنظمة وتحكم علم الروبوتات أنظمة وتحكم

Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization

133 - Aleksandr Beznosikov , Eduard Gorbunov , Alexander Gasnikov 2019

In this paper, we propose a new method based on the Sliding Algorithm from Lan(2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part. To the best of our knowledge, this is the first method in the literature that uses such a mixed oracle for the composite optimization. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space or, in some cases, its squared logarithm. We apply this method for the decentralized distributed optimization and derive upper bounds for the number of communication rounds for this method that matches known lower bounds. Moreover, our bound for the number of zeroth-order oracle calls per node matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space or, in some cases, even its squared logarithm.

التحسين والتحكم