ﻻ يوجد ملخص باللغة العربية
Multi-agent Markov Decision Processes (MMDPs) arise in a variety of applications including target tracking, control of multi-robot swarms, and multiplayer games. A key challenge in MMDPs occurs when the state and action spaces grow exponentially in the number of agents, making computation of an optimal policy computationally intractable for medium- to large-scale problems. One property that has been exploited to mitigate this complexity is transition independence, in which each agents transition probabilities are independent of the states and actions of other agents. Transition independence enables factorization of the MMDP and computation of local agent policies but does not hold for arbitrary MMDPs. In this paper, we propose an approximate transition dependence property, called $delta$-transition dependence and develop a metric for quantifying how far an MMDP deviates from transition independence. Our definition of $delta$-transition dependence recovers transition independence as a special case when $delta$ is zero. We develop a polynomial time algorithm in the number of agents that achieves a provable bound on the global optimum when the reward functions are monotone increasing and submodular in the agent actions. We evaluate our approach on two case studies, namely, multi-robot control and multi-agent patrolling example.
We present a scalable tree search planning algorithm for large multi-agent sequential decision problems that require dynamic collaboration. Teams of agents need to coordinate decisions in many domains, but naive approaches fail due to the exponential
We study the problem of minimizing the resource capacity of autonomous agents cooperating to achieve a shared task. More specifically, we consider high-level planning for a team of homogeneous agents that operate under resource constraints in stochas
We consider the challenging problem of online planning for a team of agents to autonomously search and track a time-varying number of mobile objects under the practical constraint of detection range limited onboard sensors. A standard POMDP with a va
In most multiagent applications, communication is essential among agents to coordinate their actions, and thus achieve their goal. However, communication often has a related cost that affects overall system performance. In this paper, we draw inspira
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite th