Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale Inversion Problem

104 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zecheng Zhang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Eric Chung - Yalchin Efendiev - Wing Tat Leung

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this work, we propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms. The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward. We verify our proposed algorithm by solving an inverse problem with multiple scales. There are several difficulties in the implementation of this problem by using traditional MCMC sampling. Firstly, the computation of the posterior distribution involves evaluating the forward solver, which is very time consuming for a problem with heterogeneous. We hence propose to use the multi-level algorithm. More precisely, we use the generalized multiscale finite element method (GMsFEM) as the forward solver in evaluating a posterior distribution in the multi-level rejection procedure. Secondly, it is hard to find a function which can generate samplings which are meaningful. To solve this issue, we learn an RL policy as the proposal generator. Our experiments show that the proposed method significantly improves the sampling process

قيم البحث

103 - Wenling Shang , Lasse Espeholt , Anton Raichuk 2021

Object-centric representations have recently enabled significant progress in tackling relational reasoning tasks. By building a strong object-centric inductive bias into neural architectures, recent efforts have improved generalization and data effic iency of machine learning algorithms for these problems. One problem class involving relational reasoning that still remains under-explored is multi-agent reinforcement learning (MARL). Here we investigate whether object-centric representations are also beneficial in the fully cooperative MARL setting. Specifically, we study two ways of incorporating an agent-centric inductive bias into our RL algorithm: 1. Introducing an agent-centric attention module with explicit connections across agents 2. Adding an agent-centric unsupervised predictive objective (i.e. not using action labels), to be used as an auxiliary loss for MARL, or as the basis of a pre-training step. We evaluate these approaches on the Google Research Football environment as well as DeepMind Lab 2D. Empirically, agent-centric representation learning leads to the emergence of more complex cooperation strategies between agents as well as enhanced sample efficiency and generalization.

التعلم الآلي الذكاء الاصطناعي

On the Robustness of Cooperative Multi-Agent Reinforcement Learning

246 - Jieyu Lin , Kristina Dzeparoska , Sai Qian Zhang 2020

In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team to maximize a total team reward. We analyze the robustness of c-MARL to adversaries capable of attacking one of the agents on a team. Thr ough the ability to manipulate this agents observations, the adversary seeks to decrease the total team reward. Attacking c-MARL is challenging for three reasons: first, it is difficult to estimate team rewards or how they are impacted by an agent mispredicting; second, models are non-differentiable; and third, the feature space is low-dimensional. Thus, we introduce a novel attack. The attacker first trains a policy network with reinforcement learning to find a wrong action it should encourage the victim agent to take. Then, the adversary uses targeted adversarial examples to force the victim to take this action. Our results on the StartCraft II multi-agent benchmark demonstrate that c-MARL teams are highly vulnerable to perturbations applied to one of their agents observations. By attacking a single agent, our attack method has highly negative impact on the overall team reward, reducing it from 20 to 9.4. This results in the teams winning rate to go down from 98.9% to 0%.

التعلم الآلي التشفير والأمن التعلم الالي

Emergent Social Learning via Multi-agent Reinforcement Learning

127 - Kamal Ndousse , Douglas Eck , Sergey Levine 2020

Social learning is a key component of human and animal intelligence. By taking cues from the behavior of experts in their environment, social learners can acquire sophisticated behavior and rapidly adapt to new circumstances. This paper investigates whether independent reinforcement learning (RL) agents in a multi-agent environment can learn to use social learning to improve their performance. We find that in most circumstances, vanilla model-free RL agents do not use social learning. We analyze the reasons for this deficiency, and show that by imposing constraints on the training environment and introducing a model-based auxiliary loss we are able to obtain generalized social learning policies which enable agents to: i) discover complex skills that are not learned from single-agent training, and ii) adapt online to novel environments by taking cues from experts present in the new environment. In contrast, agents trained with model-free RL or imitation learning generalize poorly and do not succeed in the transfer tasks. By mixing multi-agent and solo training, we can obtain agents that use social learning to gain skills that they can deploy when alone, even out-performing agents trained alone from the start.

التعلم الآلي الذكاء الاصطناعي أنظمة متعددة العملاء

Mutual Information for Explainable Deep Learning of Multiscale Systems

277 - S{o}ren Taverniers , Eric J. Hall , Markos A. Katsoulakis andn Daniel M. Tartakovsky 2020

Timely completion of design cycles for complex systems ranging from consumer electronics to hypersonic vehicles relies on rapid simulation-based prototyping. The latter typically involves high-dimensional spaces of possibly correlated control variabl es (CVs) and quantities of interest (QoIs) with non-Gaussian and possibly multimodal distributions. We develop a model-agnostic, moment-independent global sensitivity analysis (GSA) that relies on differential mutual information to rank the effects of CVs on QoIs. The data requirements of this information-theoretic approach to GSA are met by replacing computationally intensive components of the physics-based model with a deep neural network surrogate. Subsequently, the GSA is used to explain the network predictions, and the surrogate is deployed to close design loops. Viewed as an uncertainty quantification method for interrogating the surrogate, this framework is compatible with a wide variety of black-box models. We demonstrate that the surrogate-driven mutual information GSA provides useful and distinguishable rankings on two applications of interest in energy storage. Consequently, our information-theoretic GSA provides an outer loop for accelerated product design by identifying the most and least sensitive input directions and performing subsequent optimization over appropriately reduced parameter subspaces.

التعلم الآلي التحليل العددي التحليل العددي

Reinforcement Learning for Adaptive Mesh Refinement

167 - Jiachen Yang , Tarik Dzanic , Brenden Petersen 2021

Large-scale finite element simulations of complex physical systems governed by partial differential equations crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required. Existing scalable AMR methods make heuristic refinement decisions based on instantaneous error estimation and thus do not aim for long-term optimality over an entire simulation. We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning (RL) to train refinement policies directly from simulation. AMR poses a new problem for RL in that both the state dimension and available action set changes at every step, which we solve by proposing new policy architectures with differing generality and inductive bias. The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations. We demonstrate in comprehensive experiments on static function estimation and the advection of different fields that RL policies can be competitive with a widely-used error estimator and generalize to larger, more complex, and unseen test problems.

التعلم الآلي التحليل العددي التحليل العددي