ﻻ يوجد ملخص باللغة العربية
We study the problem of emergent communication, in which language arises because speakers and listeners must communicate information in order to solve tasks. In temporally extended reinforcement learning domains, it has proved hard to learn such communication without centralized training of agents, due in part to a difficult joint exploration problem. We introduce inductive biases for positive signalling and positive listening, which ease this problem. In a simple one-step environment, we demonstrate how these biases ease the learning problem. We also apply our methods to a more extended environment, showing that agents with these inductive biases achieve better performance, and analyse the resulting communication protocols.
In this work, we study emergent communication through the lens of cooperative multi-agent behavior in nature. Using insights from animal communication, we propose a spectrum from low-bandwidth (e.g. pheromone trails) to high-bandwidth (e.g. compositi
Agent advising is one of the main approaches to improve agent learning performance by enabling agents to share advice. Existing advising methods have a common limitation that an adviser agent can offer advice to an advisee agent only if the advice is
Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised
Matrix games like Prisoners Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Coope
We propose a curriculum-driven learning strategy for solving difficult multi-agent coordination tasks. Our method is inspired by a study of animal communication, which shows that two straightforward design features (mutual reward and decentralization