Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل David Fridovich-Keil

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Roel Dobbe - David Fridovich-Keil - Claire Tomlin

أنظمة وتحكم الذكاء الاصطناعي نظرية المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Learning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.

قيم البحث

648 - Mauro Franceschelli , Andrea Gasparri , Alessandro Giua 2012

In this paper we present a decentralized algorithm to estimate the eigenvalues of the Laplacian matrix that encodes the network topology of a multi-agent system. We consider network topologies modeled by undirected graphs. The basic idea is to provid e a local interaction rule among agents so that their state trajectory is a linear combination of sinusoids oscillating only at frequencies function of the eigenvalues of the Laplacian matrix. In this way, the problem of decentralized estimation of the eigenvalues is mapped into a standard signal processing problem in which the unknowns are the finite number of frequencies at which the signal oscillates.

أنظمة وتحكم

An Information-Theoretic Perspective on Overfitting and Underfitting

113 - Daniel Bashir , George D. Montanez , Sonia Sehra 2020

We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algori thm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.

التعلم الآلي الذكاء الاصطناعي نظرية المعلومات

Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems

303 - Guannan Qu , Adam Wierman , Na Li 2019

We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental chal lenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor-Critic (SAC) framework that exploits the network structure and finds a localized policy that is a $O(rho^kappa)$-approximation of a stationary point of the objective for some $rhoin(0,1)$, with complexity that scales with the local state-action space size of the largest $kappa$-hop neighborhood of the network.

التحسين والتحكم الذكاء الاصطناعي التعلم الآلي

Learning Pareto-Frontier Resource Management Policies for Heterogeneous SoCs: An Information-Theoretic Approach

171 - Aryan Deshwal , Syrine Belakaria , Ganapati Bhat 2021

Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arms Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multiple objectives such as performance and energy. This paper proposes a novel information-theoretic framework referred to as PaRMIS to create Pareto-optimal resource management policies for given target applications and design objectives. PaRMIS specifies parametric policies to manage resources and learns statistical models from candidate policy evaluation data in the form of target design objective values. The key idea is to select a candidate policy for evaluation in each iteration guided by statistical models that maximize the information gain about the true Pareto front. Experiments on a commercial heterogeneous SoC show that PaRMIS achieves better Pareto fronts and is easily usable to optimize complex objectives (e.g., performance per Watt) when compared to prior methods.

هندسة العتاد النظم الموزعة والتوازية والحوسبة العنقودية أنظمة وتحكم

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

189 - Ying Wen , Hui Chen , Yaodong Yang 2021

Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration. Nonetheless, when applied in multi-agent settings, the guarantee of trust region meth ods no longer holds because an agents payoff is also affected by other agents adaptive behaviors. To tackle this problem, we conduct a game-theoretical analysis in the policy space, and propose a multi-agent trust region learning method (MATRL), which enables trust region optimization for multi-agent learning. Specifically, MATRL finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level. We derive the monotonic improvement guarantee in multi-agent settings and empirically show the local convergence of MATRL to stable fixed points in the two-player rotational differential game. To test our method, we evaluate MATRL in both discrete and continuous multiplayer general-sum games including checker and switch grid worlds, multi-agent MuJoCo, and Atari games. Results suggest that MATRL significantly outperforms strong multi-agent reinforcement learning baselines.

أنظمة متعددة العملاء الذكاء الاصطناعي التعلم الآلي