ﻻ يوجد ملخص باللغة العربية
We discuss the problem of learning collaborative behaviour through communication in multi-agent systems using deep reinforcement learning. A connectivity-driven communication (CDC) algorithm is proposed to address three key aspects: what agents to involve in the communication, what information content to share, and how often to share it. The multi-agent system is modelled as a weighted graph with nodes representing agents. The unknown edge weights reflect the degree of communication between pairs of agents, which depends on a diffusion process on the graph - the heat kernel. An optimal communication strategy, tightly coupled with overall graph topology, is learned end-to-end concurrently with the agents policy so as to maximise future expected returns. Empirical results show that CDC is capable of superior performance over alternative algorithms for a range of cooperative navigation tasks, and that the learned graph structures can be interpretable.
Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks r
Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows re
We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments. This targeting
We consider the problem where $N$ agents collaboratively interact with an instance of a stochastic $K$ arm bandit problem for $K gg N$. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of $T$ time steps,
In this work, we propose a novel memory-based multi-agent meta-learning architecture and learning procedure that allows for learning of a shared communication policy that enables the emergence of rapid adaptation to new and unseen environments by lea