No Arabic abstract
This paper introduces a microscopic approach to model epidemics, which can explicitly consider the consequences of individuals decisions on the spread of the disease. We first formulate a microscopic multi-agent epidemic model where every agent can choose its activity level that affects the spread of the disease. Then by minimizing agents cost functions, we solve for the optimal decisions for individual agents in the framework of game theory and multi-agent reinforcement learning. Given the optimal decisions of all agents, we can make predictions about the spread of the disease. We show that there are negative externalities in the sense that infected agents do not have enough incentives to protect others, which then necessitates external interventions to regulate agents behaviors. In the discussion section, future directions are pointed out to make the model more realistic.
Microscopic epidemic models are powerful tools for government policy makers to predict and simulate epidemic outbreaks, which can capture the impact of individual behaviors on the macroscopic phenomenon. However, existing models only consider simple rule-based individual behaviors, limiting their applicability. This paper proposes a deep-reinforcement-learning-powered microscopic model named Microscopic Pandemic Simulator (MPS). By replacing rule-based agents with rational agents whose behaviors are driven to maximize rewards, the MPS provides a better approximation of real world dynamics. To efficiently simulate with massive amounts of agents in MPS, we propose Scalable Million-Agent DQN (SMADQN). The MPS allows us to efficiently evaluate the impact of different government strategies. This paper first calibrates the MPS against real-world data in Allegheny, US, then demonstratively evaluates two government strategies: information disclosure and quarantine. The results validate the effectiveness of the proposed method. As a broad impact, this paper provides novel insights for the application of DRL in large scale agent-based networks such as economic and social networks.
Pursuit-evasion is the problem of capturing mobile targets with one or more pursuers. We use deep reinforcement learning for pursuing an omni-directional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints. We use shared experience to train a policy for a given number of pursuers that is executed independently by each agent at run-time. The training benefits from curriculum learning, a sweeping-angle ordering to locally represent neighboring agents and encouraging good formations with reward structure that combines individual and group rewards. Simulated experiments with a reactive evader and up to eight pursuers show that our learning-based approach, with non-holonomic agents, performs on par with classical algorithms with omni-directional agents, and outperforms their non-holonomic adaptations. The learned policy is successfully transferred to the real world in a proof-of-concept demonstration with three motion-constrained pursuer drones.
Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria---a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible. Most of that work was based on laboratory experiments where participants only make a single choice: how much to appropriate. Recognizing the importance of spatial and temporal resource dynamics, a recent trend has been toward experiments in more complex real-time video game-like environments. However, standard methods of non-cooperative game theory can no longer be used to generate predictions for this case. Here we show that deep reinforcement learning can be used instead. To that end, we study the emergent behavior of groups of independently learning agents in a partially observed Markov game modeling common-pool resource appropriation. Our experiments highlight the importance of trial-and-error learning in common-pool resource appropriation and shed light on the relationship between exclusion, sustainability, and inequality.
Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.
Agent advising is one of the main approaches to improve agent learning performance by enabling agents to share advice. Existing advising methods have a common limitation that an adviser agent can offer advice to an advisee agent only if the advice is created in the same state as the advisees concerned state. However, in complex environments, it is a very strong requirement that two states are the same, because a state may consist of multiple dimensions and two states being the same means that all these dimensions in the two states are correspondingly identical. Therefore, this requirement may limit the applicability of existing advising methods to complex environments. In this paper, inspired by the differential privacy scheme, we propose a differential advising method which relaxes this requirement by enabling agents to use advice in a state even if the advice is created in a slightly different state. Compared with existing methods, agents using the proposed method have more opportunity to take advice from others. This paper is the first to adopt the concept of differential privacy on advising to improve agent learning performance instead of addressing security issues. The experimental results demonstrate that the proposed method is more efficient in complex environments than existing methods.