Towards Resilience for Multi-Agent $QD$-Learning


الملخص بالإنكليزية

This paper considers the multi-agent reinforcement learning (MARL) problem for a networked (peer-to-peer) system in the presence of Byzantine agents. We build on an existing distributed $Q$-learning algorithm, and allow certain agents in the network to behave in an arbitrary and adversarial manner (as captured by the Byzantine attack model). Under the proposed algorithm, if the network topology is $(2F+1)$-robust and up to $F$ Byzantine agents exist in the neighborhood of each regular agent, we establish the almost sure convergence of all regular agents value functions to the neighborhood of the optimal value function of all regular agents. For each state, if the optimal $Q$-values of all regular agents corresponding to different actions are sufficiently separated, our approach allows each regular agent to learn the optimal policy for all regular agents.

تحميل البحث