Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tonghan Wang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Heng Dong - Tonghan Wang - Jiayuan Liu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

How cooperation emerges is a long-standing and interdisciplinary problem. Game-theoretical studies on social dilemmas reveal that altruistic incentives are critical to the emergence of cooperation but their analyses are limited to stateless games. For more realistic scenarios, multi-agent reinforcement learning has been used to study sequential social dilemmas (SSDs). Recent works show that learning to incentivize other agents can promote cooperation in SSDs. However, we find that, with these incentivizing mechanisms, the team cooperation level does not converge and regularly oscillates between cooperation and defection during learning. We show that a second-order social dilemma resulting from the incentive mechanisms is the main reason for such fragile cooperation. We formally analyze the dynamics of second-order social dilemmas and find that a typical tendency of humans, called homophily, provides a promising solution. We propose a novel learning framework to encourage homophilic incentives and show that it achieves stable cooperation in both SSDs of public goods and tragedy of the commons.

قيم البحث

189 - Ying Wen , Hui Chen , Yaodong Yang 2021

Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration. Nonetheless, when applied in multi-agent settings, the guarantee of trust region meth ods no longer holds because an agents payoff is also affected by other agents adaptive behaviors. To tackle this problem, we conduct a game-theoretical analysis in the policy space, and propose a multi-agent trust region learning method (MATRL), which enables trust region optimization for multi-agent learning. Specifically, MATRL finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level. We derive the monotonic improvement guarantee in multi-agent settings and empirically show the local convergence of MATRL to stable fixed points in the two-player rotational differential game. To test our method, we evaluate MATRL in both discrete and continuous multiplayer general-sum games including checker and switch grid worlds, multi-agent MuJoCo, and Atari games. Results suggest that MATRL significantly outperforms strong multi-agent reinforcement learning baselines.

أنظمة متعددة العملاء الذكاء الاصطناعي التعلم الآلي

Automatic Calibration of Dynamic and Heterogeneous Parameters in Agent-based Model

99 - Dongjun Kim , Tae-Sub Yun , Il-Chul Moon 2019

While simulations have been utilized in diverse domains, such as urban growth modeling, market dynamics modeling, etc; some of these applications may require validations based upon some real-world observations modeled in the simulation, as well. This validation has been categorized into either qualitative face-validation or quantitative empirical validation, but as the importance and the accumulation of data grows, the importance of the quantitative validation has been highlighted in the recent studies, i.e. digital twin. The key component of quantitative validation is finding a calibrated set of parameters to regenerate the real-world observations with simulation models. While this parameter calibration has been fixed throughout a simulation execution, this paper expands the static parameter calibration in two dimensions: dynamic calibration and heterogeneous calibration. First, dynamic calibration changes the parameter values over the simulation period by reflecting the simulation output trend. Second, heterogeneous calibration changes the parameter values per simulated entity clusters by considering the similarities of entity states. We experimented the suggested calibrations on one hypothetical case and another real-world case. As a hypothetical scenario, we use the Wealth Distribution Model to illustrate how our calibration works. As a real-world scenario, we selected Real Estate Market Model because of three reasons. First, the models have heterogeneous entities as being agent-based models; second, they are economic models with real-world trends over time; and third, they are applicable to the real-world scenarios where we can gather validation data.

أنظمة متعددة العملاء أجهزة الكمبيوتر والمجتمع التعلم الآلي

Birds of a Feather: Capturing Avian Shape Models from Images

133 - Yufu Wang , Nikos Kolotouros , Kostas Daniilidis 2021

Animals are diverse in shape, but building a deformable shape model for a new species is not always possible due to the lack of 3D data. We present a method to capture new species using an articulated template and images of that species. In this work , we focus mainly on birds. Although birds represent almost twice the number of species as mammals, no accurate shape model is available. To capture a novel species, we first fit the articulated template to each training sample. By disentangling pose and shape, we learn a shape space that captures variation both among species and within each species from image evidence. We learn models of multiple species from the CUB dataset, and contribute new species-specific and multi-species shape models that are useful for downstream reconstruction tasks. Using a low-dimensional embedding, we show that our learned 3D shape space better reflects the phylogenetic relationships among birds than learned perceptual features.

الرؤية الحاسوبية وتمييز الأنماط

STMARL: A Spatio-Temporal Multi-Agent Reinforcement Learning Approach for Cooperative Traffic Light Control

96 - Yanan Wang , Tong Xu , Xin Niu 2019

The development of intelligent traffic light control systems is essential for smart transportation management. While some efforts have been made to optimize the use of individual traffic lights in an isolated way, related studies have largely ignored the fact that the use of multi-intersection traffic lights is spatially influenced and there is a temporal dependency of historical traffic status for current traffic light control. To that end, in this paper, we propose a novel SpatioTemporal Multi-Agent Reinforcement Learning (STMARL) framework for effectively capturing the spatio-temporal dependency of multiple related traffic lights and control these traffic lights in a coordinating way. Specifically, we first construct the traffic light adjacency graph based on the spatial structure among traffic lights. Then, historical traffic records will be integrated with current traffic status via Recurrent Neural Network structure. Moreover, based on the temporally-dependent traffic information, we design a Graph Neural Network based model to represent relationships among multiple traffic lights, and the decision for each traffic light will be made in a distributed way by the deep Q-learning method. Finally, the experimental results on both synthetic and real-world data have demonstrated the effectiveness of our STMARL framework, which also provides an insightful understanding of the influence mechanism among multi-intersection traffic lights.

أنظمة متعددة العملاء الذكاء الاصطناعي التعلم الآلي

Birds of a feather or opposites attract - effects in network modelling

469 - Maria Deijfen , Robert Fitzner 2016

We study properties of some standard network models when the population is split into two types and the connection pattern between the types is varied. The studied models are generalizations of the ErdH{o}s-R{e}nyi graph, the configuration model and a preferential attachment graph. For the ErdH{o}s-R{e}nyi graph and the configuration model, the focus is on the component structure. We derive expressions for the critical parameter, indicating when there is a giant component in the graph, and study the size of the largest component by aid of simulations. When the expected degrees in the graph are fixed and the connections are shifted so that more edges connect vertices of different types, we find that the critical parameter decreases. The size of the largest component in the supercritical regime can be both increasing and decreasing as the connections change, depending on the combination of types. For the preferential attachment model, we analyze the degree distributions of the two types and derive explicit expressions for the degree exponents. The exponents are confirmed by simulations that also illustrate other properties of the degree structure.

الاحتمالات الشبكات الاجتماعية والمعلومات الفيزياء والمجتمع