ﻻ يوجد ملخص باللغة العربية
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.
The Iterated Prisoners Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoners dilemmas, these choices are temporally extended and different
In timeline-based planning, domains are described as sets of independent, but interacting, components, whose behaviour over time (the set of timelines) is governed by a set of temporal constraints. A distinguishing feature of timeline-based planning
A game-theoretic framework is used to study the effect of constellation size on the energy efficiency of wireless networks for M-QAM modulation. A non-cooperative game is proposed in which each user seeks to choose its transmit power (and possibly tr
It is a long-standing goal of artificial intelligence (AI) to be superior to human beings in decision making. Games are suitable for testing AI capabilities of making good decisions in non-numerical tasks. In this paper, we develop a new AI algorithm
A game-theoretic framework is used to study the effect of constellation size on the energy efficiency of wireless networks for M-QAM modulation. A non-cooperative game is proposed in which each user seeks to choose its transmit power (and possibly tr