ﻻ يوجد ملخص باللغة العربية
Training a multi-agent reinforcement learning (MARL) model is generally difficult because there are numerous combinations of complex interactions among agents that induce certain reward signals. Especially when there is a sparse reward signal, the training becomes more difficult. Previous studies have tried to resolve this issue by employing an intrinsic reward, which is a signal specifically designed for inducing the interactions among agents, to boost the MARL model training. However, this approach requires extensive prior knowledge to design an intrinsic reward. To optimize the training of an MARL model, we propose a learning-based exploration strategy to generate the initial states of a game. The proposed method adopts a variational graph autoencoder to represent a state of a game such that (1) the state can be compactly encoded to the latent representation by considering the relationship among agents, and (2) the latent representation can be used as an effective input to the surrogate model predicting the exploration score. The proposed method determines the latent representations that maximize the surrogate model and decodes these representations to generate the initial states from which the MARL model starts training. Empirically, we demonstrate that the generated states improve the training and performance of MARL more than the existing exploration methods.
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint
Solving tasks with sparse rewards is one of the most important challenges in reinforcement learning. In the single-agent setting, this challenge is addressed by introducing intrinsic rewards that motivate agents to explore unseen regions of their sta
High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty ca
Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settin
We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments. This targeting