ﻻ يوجد ملخص باللغة العربية
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=mathcal{O}(frac{sqrt{|mathcal{X}||mathcal{U}|}}{N_{mathrm{pop}}}sum_{k}sqrt{N_k})$, $e_2=mathcal{O}(sqrt{|mathcal{X}||mathcal{U}|}sum_{k}frac{1}{sqrt{N_k}})$ and $e_3=mathcal{O}left(sqrt{|mathcal{X}||mathcal{U}|}left[frac{A}{N_{mathrm{pop}}}sum_{kin[K]}sqrt{N_k}+frac{B}{sqrt{N_{mathrm{pop}}}}right]right)$, respectively, where $A, B$ are some constants and $|mathcal{X}|,|mathcal{U}|$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $mathcal{O}(e_j)$ error with a sample complexity of $mathcal{O}(e_j^{-3})$, $jin{1,2,3}$, respectively.
Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this
In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team to maximize a total team reward. We analyze the robustness of c-MARL to adversaries capable of attacking one of the agents on a team. Thr
Training a multi-agent reinforcement learning (MARL) algorithm is more challenging than training a single-agent reinforcement learning algorithm, because the result of a multi-agent task strongly depends on the complex interactions among agents and t
Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating th
Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks. However, current methods pay little attention to the interaction between