Distributed Algorithms for Linearly-Solvable Optimal Control in Networked Multi-Agent Systems

112 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Neng Wan

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Neng Wan - Aditya Gahlawat - Naira Hovakimyan

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Distributed algorithms for both discrete-time and continuous-time linearly solvable optimal control (LSOC) problems of networked multi-agent systems (MASs) are investigated in this paper. A distributed framework is proposed to partition the optimal control problem of a networked MAS into several local optimal control problems in factorial subsystems, such that each (central) agent behaves optimally to minimize the joint cost function of a subsystem that comprises a central agent and its neighboring agents, and the local control actions (policies) only rely on the knowledge of local observations. Under this framework, we not only preserve the correlations between neighboring agents, but moderate the communication and computational complexities by decentralizing the sampling and computational processes over the network. For discrete-time systems modeled by Markov decision processes, the joint Bellman equation of each subsystem is transformed into a system of linear equations and solved using parallel programming. For continuous-time systems modeled by It^o diffusion processes, the joint optimality equation of each subsystem is converted into a linear partial differential equation, whose solution is approximated by a path integral formulation and a sample-efficient relative entropy policy search algorithm, respectively. The learned control policies are generalized to solve the unlearned tasks by resorting to the compositionality principle, and illustrative examples of cooperative UAV teams are provided to verify the effectiveness and advantages of these algorithms.

قيم البحث

109 - Yiheng Lin , Guannan Qu , Longbo Huang 2020

We study multi-agent reinforcement learning (MARL) in a time-varying network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and time-varying, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of RL in networked systems.

التعلم الآلي أنظمة متعددة العملاء التعلم الالي

Multi-agent Reinforcement Learning for Networked System Control

88 - Tianshu Chu , Sandeep Chinchali , Sachin Katti 2020

This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networke d MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local agent. Further, we propose a new differentiable communication protocol, called NeurComm, to reduce information loss and non-stationarity in NMARL. Based on experiments in realistic NMARL scenarios of adaptive traffic signal control and cooperative adaptive cruise control, an appropriate spatial discount factor effectively enhances the learning curves of non-communicative MARL algorithms, while NeurComm outperforms existing communication protocols in both learning efficiency and control performance.

التعلم الآلي التعلم الالي

Prescribed Performance Distance-Based Formation Control of Multi-Agent Systems (Extended Version)

86 - Farhad Mehdifar , Charalampos P. Bechlioulis , Farzad Hashemzadeh 2019

This paper presents a novel control protocol for robust distance-based formation control with prescribed performance in which agents are subjected to unknown external disturbances. Connectivity maintenance and collision avoidance among neighboring ag ents are also handled by the appropriate design of certain performance bounds that constrain the inter-agent distance errors. As an extension to the proposed scheme, distance-based formation centroid maneuvering is also studied for disturbance-free agents, in which the formation centroid tracks a desired time-varying velocity. The proposed control laws are decentralized, in the sense that each agent employs local relative information regarding its neighbors to calculate its control signal. Therefore, the control scheme is implementable on the agents local coordinate frames. Using rigid graph theory, input-to-state stability, and Lyapunov based analysis, the results are established for minimally and infinitesimally rigid formations in 2-D or 3-D space. Furthermore, it is argued that the proposed approach increases formation robustness against shape distortions and can prevent formation convergence to incorrect shapes, which is likely to happen in conventional distance-based formation control methods. Finally, extensive simulation studies clarify and verify the proposed approach.

أنظمة وتحكم أنظمة متعددة العملاء علم الروبوتات

Distributed sampled-data control of nonholonomic multi-robot systems with proximity networks

142 - Zhixin Liu , Lin Wang , Jinhuan Wang 2016

This paper considers the distributed sampled-data control problem of a group of mobile robots connected via distance-induced proximity networks. A dwell time is assumed in order to avoid chattering in the neighbor relations that may be caused by abru pt changes of positions when updating information from neighbors. Distributed sampled-data control laws are designed based on nearest neighbour rules, which in conjunction with continuous-time dynamics results in hybrid closed-loop systems. For uniformly and independently initial states, a sufficient condition is provided to guarantee synchronization for the system without leaders. In order to steer all robots to move with the desired orientation and speed, we then introduce a number of leaders into the system, and quantitatively establish the proportion of leaders needed to track either constant or time-varying signals. All these conditions depend only on the neighborhood radius, the maximum initial moving speed and the dwell time, without assuming a prior properties of the neighbor graphs as are used in most of the existing literature.

أنظمة وتحكم أنظمة متعددة العملاء علم الروبوتات

Mava: a research framework for distributed multi-agent reinforcement learning

435 - Arnu Pretorius , Kale-ab Tessera , Andries P. Smit 2021

Breakthrough advances in reinforcement learning (RL) research have led to a surge in the development and application of RL. To support the field and its rapid growth, several frameworks have emerged that aim to help the community more easily build ef fective and scalable agents. However, very few of these frameworks exclusively support multi-agent RL (MARL), an increasingly active field in itself, concerned with decentralised decision-making problems. In this work, we attempt to fill this gap by presenting Mava: a research framework specifically designed for building scalable MARL systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution, while providing a high level of flexibility and composability. Mava is built on top of DeepMinds Acme citep{hoffman2020acme}, and therefore integrates with, and greatly benefits from, a wide range of already existing single-agent RL components made available in Acme. Several MARL baseline systems have already been implemented in Mava. These implementations serve as examples showcasing Mavas reusable features, such as interchangeable system architectures, communication and mixing modules. Furthermore, these implementations allow existing MARL algorithms to be easily reproduced and extended. We provide experimental results for these implementations on a wide range of multi-agent environments and highlight the benefits of distributed system training.

التعلم الآلي أنظمة متعددة العملاء