ترغب بنشر مسار تعليمي؟ اضغط هنا

Transfer among Agents: An Efficient Multiagent Transfer Learning Framework

157   0   0.0 ( 0 )
 نشر من قبل Tianpei Yang
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Transfer Learning has shown great potential to enhance the single-agent Reinforcement Learning (RL) efficiency. Similarly, Multiagent RL (MARL) can also be accelerated if agents can share knowledge with each other. However, it remains a problem of how an agent should learn from other agents. In this paper, we propose a novel Multiagent Option-based Policy Transfer (MAOPT) framework to improve MARL efficiency. MAOPT learns what advice to provide and when to terminate it for each agent by modeling multiagent policy transfer as the option learning problem. Our framework provides two kinds of option learning methods in terms of what experience is used during training. One is the global option advisor, which uses the global experience for the update. The other is the local option advisor, which uses each agents local experience when only each agents local experiences can be obtained due to partial observability. While in this setting, each agents experience may be inconsistent with each other, which may cause the inaccuracy and oscillation of the option-values estimation. Therefore, we propose the successor representation option learning to solve it by decoupling the environment dynamics from rewards and learning the option-value under each agents preference. MAOPT can be easily combined with existing deep RL and MARL approaches, and experimental results show it significantly boosts the performance of existing methods in both discrete and continuous state spaces.



قيم البحث

اقرأ أيضاً

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in su ch challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value $Q_{tot}$ into individual Q-values $Q^{i}$ to guide individuals behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between $Q_{tot}$ and $Q^{i}$ and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual $Q^{i}$s into $Q_{tot}$. In this paper, we theoretically derive a general formula of $Q_{tot}$ in terms of $Q^{i}$, based on which we can naturally implement a multi-head attention formation to approximate $Q_{tot}$, resulting in not only a refined representation of $Q_{tot}$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.
Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games. In these games, algorithms such as fictitious self-play and minimax tree search can converge to an approximate Nash equilibrium. While playing a Nash e quilibrium strategy in a two-player zero-sum game is optimal, in an $n$-player general sum game, it becomes a much less informative solution concept. Despite the lack of a satisfying solution concept, $n$-player games form the vast majority of real-world multiagent situations. In this paper we present a new framework for research in reinforcement learning in $n$-player games. We hope that by analyzing behavior learned by agents in these environments the community can better understand this important research area and move toward meaningful solution concepts and research directions. The implementation and additional information about this framework can be found at https://colosseumrl.igb.uci.edu/.
Modeling agent behavior is central to understanding the emergence of complex phenomena in multiagent systems. Prior work in agent modeling has largely been task-specific and driven by hand-engineering domain-specific prior knowledge. We propose a gen eral learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data. Our framework casts agent modeling as a representation learning problem. Consequently, we construct a novel objective inspired by imitation learning and agent identification and design an algorithm for unsupervised learning of representations of agent policies. We demonstrate empirically the utility of the proposed framework in (i) a challenging high-dimensional competitive environment for continuous control and (ii) a cooperative environment for communication, on supervised predictive tasks, unsupervised clustering, and policy optimization using deep reinforcement learning.
Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.
Synchronized movement of (both unicellular and multicellular) systems can be observed almost everywhere. Understanding of how organisms are regulated to synchronized behavior is one of the challenging issues in the field of collective motion. It is h ypothesized that one or a few agents in a group regulate(s) the dynamics of the whole collective, known as leader(s). The identification of the leader (influential) agent(s) is very crucial. This article reviews different mathematical models that represent different types of leadership. We focus on the improvement of the leader-follower classification problem. It was found using a simulation model that the use of interaction domain information significantly improves the leader-follower classification ability using both linear schemes and information-theoretic schemes for quantifying influence. This article also reviews different schemes that can be used to identify the interaction domain using the motion data of agents.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا