No Arabic abstract
In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents policies, which can recover agents policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at url{https://github.com/apexrl/CoDAIL}.
Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.
We study fairness through the lens of cooperative multi-agent learning. Our work is motivated by empirical evidence that naive maximization of team reward yields unfair outcomes for individual team members. To address fairness in multi-agent contexts, we introduce team fairness, a group-based fairness measure for multi-agent learning. We then prove that it is possible to enforce team fairness during policy optimization by transforming the teams joint policy into an equivariant map. We refer to our multi-agent learning strategy as Fairness through Equivariance (Fair-E) and demonstrate its effectiveness empirically. We then introduce Fairness through Equivariance Regularization (Fair-ER) as a soft-constraint version of Fair-E and show that it reaches higher levels of utility than Fair-E and fairer outcomes than non-equivariant policies. Finally, we present novel findings regarding the fairness-utility trade-off in multi-agent settings; showing that the magnitude of the trade-off is dependent on agent skill level.
In most multiagent applications, communication is essential among agents to coordinate their actions, and thus achieve their goal. However, communication often has a related cost that affects overall system performance. In this paper, we draw inspiration from studies of epistemic planning to develop a communication model for agents that allows them to cooperate and make communication decisions effectively within a planning task. The proposed model treats a communication process as an action that modifies the epistemic state of the team. In two simulated tasks, we evaluate whether agents can cooperate effectively and achieve higher performance using communication protocol modeled in our epistemic planning framework. Based on an empirical study conducted using search and rescue tasks with different scenarios, our results show that the proposed model improved team performance across all scenarios compared with baseline models.
Unsupervised skill discovery drives intelligent agents to explore the unknown environment without task-specific reward signal, and the agents acquire various skills which may be useful when the agents adapt to new tasks. In this paper, we propose Multi-agent Skill Discovery(MASD), a method for discovering skills for coordination patterns of multiple agents. The proposed method aims to maximize the mutual information between a latent code Z representing skills and the combination of the states of all agents. Meanwhile it suppresses the empowerment of Z on the state of any single agent by adversarial training. In another word, it sets an information bottleneck to avoid empowerment degeneracy. First we show the emergence of various skills on the level of coordination in a general particle multi-agent environment. Second, we reveal that the bottleneck prevents skills from collapsing to a single agent and enhances the diversity of learned skills. Finally, we show the pretrained policies have better performance on supervised RL tasks.
With the emergence of autonomous vehicles, it is important to understand their impact on the transportation system. However, conventional traffic simulations are time-consuming. In this paper, we introduce an analytical traffic model for unmanaged intersections accounting for microscopic vehicle interactions. The macroscopic property, i.e., delay at the intersection, is modeled as an event-driven stochastic dynamic process, whose dynamics encode the microscopic vehicle behaviors. The distribution of macroscopic properties can be obtained through either direct analysis or event-driven simulation. They are more efficient than conventional (time-driven) traffic simulation, and capture more microscopic details compared to conventional macroscopic flow models. We illustrate the efficiency of this method by delay analyses under two different policies at a two-lane intersection. The proposed model allows for 1) efficient and effective comparison among different policies, 2) policy optimization, 3) traffic prediction, and 4) system optimization (e.g., infrastructure and protocol).