No Arabic abstract
The early literature on epistemic logic in philosophy focused on reasoning about the knowledge or belief of a single agent, especially on controversies about introspection axioms such as the 4 and 5 axioms. By contrast, the later literature on epistemic logic in computer science and game theory has focused on multi-agent epistemic reasoning, with the single-agent 4 and 5 axioms largely taken for granted. In the relevant multi-agent scenarios, it is often important to reason about what agent A believes about what agent B believes about what agent A believes; but it is rarely important to reason just about what agent A believes about what agent A believes. This raises the question of the extent to which single-agent introspection axioms actually matter for multi-agent epistemic reasoning. In this paper, we formalize and answer this question. To formalize the question, we first define a set of multi-agent formulas that we call agent-alternating formulas, including formulas like Box_a Box_b Box_a p but not formulas like Box_a Box_a p. We then prove, for the case of belief, that if one starts with multi-agent K or KD, then adding both the 4 and 5 axioms (or adding the B axiom) does not allow the derivation of any new agent-alternating formulas -- in this sense, introspection axioms do not matter. By contrast, we show that such conservativity results fail for knowledge and multi-agent KT, though they hold with respect to a smaller class of agent-nonrepeating formulas.
We introduce a new semantics for a multi-agent epistemic operator of knowing how, based on an indistinguishability relation between plans. Our proposal is, arguably, closer to the standard presentation of knowing that modalities in classical epistemic logic. We study the relationship between this semantics and previous approaches, showing that our setting is general enough to capture them. We also define a sound and complete axiomatization, and investigate the computational complexity of its model checking and satisfiability problems.
We present a scalable tree search planning algorithm for large multi-agent sequential decision problems that require dynamic collaboration. Teams of agents need to coordinate decisions in many domains, but naive approaches fail due to the exponential growth of the joint action space with the number of agents. We circumvent this complexity through an anytime approach that allows us to trade computation for approximation quality and also dynamically coordinate actions. Our algorithm comprises three elements: online planning with Monte Carlo Tree Search (MCTS), factored representations of local agent interactions with coordination graphs, and the iterative Max-Plus method for joint action selection. We evaluate our approach on the benchmark SysAdmin domain with static coordination graphs and achieve comparable performance with much lower computation cost than our MCTS baselines. We also introduce a multi-drone delivery domain with dynamic, i.e., state-dependent coordination graphs, and demonstrate how our approach scales to large problems on this domain that are intractable for other MCTS methods. We provide an open-source implementation of our algorithm at https://github.com/JuliaPOMDP/FactoredValueMCTS.jl.
Multi-agent spatiotemporal modeling is a challenging task from both an algorithmic design and computational complexity perspective. Recent work has explored the efficacy of traditional deep sequential models in this domain, but these architectures are slow and cumbersome to train, particularly as model size increases. Further, prior attempts to model interactions between agents across time have limitations, such as imposing an order on the agents, or making assumptions about their relationships. In this paper, we introduce baller2vec, a multi-entity generalization of the standard Transformer that can, with minimal assumptions, simultaneously and efficiently integrate information across entities and time. We test the effectiveness of baller2vec for multi-agent spatiotemporal modeling by training it to perform two different basketball-related tasks: (1) simultaneously forecasting the trajectories of all players on the court and (2) forecasting the trajectory of the ball. Not only does baller2vec learn to perform these tasks well (outperforming a graph recurrent neural network with a similar number of parameters by a wide margin), it also appears to understand the game of basketball, encoding idiosyncratic qualities of players in its embeddings, and performing basketball-relevant functions with its attention heads.
Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
When designing systems that are complex, dynamic and stochastic in nature, simulation is generally recognised as one of the best design support technologies, and a valuable aid in the strategic and tactical decision making process. A simulation model consists of a set of rules that define how a system changes over time, given its current state. Unlike analytical models, a simulation model is not solved but is run and the changes of system states can be observed at any point in time. This provides an insight into system dynamics rather than just predicting the output of a system based on specific inputs. Simulation is not a decision making tool but a decision support tool, allowing better informed decisions to be made. Due to the complexity of the real world, a simulation model can only be an approximation of the target system. The essence of the art of simulation modelling is abstraction and simplification. Only those characteristics that are important for the study and analysis of the target system should be included in the simulation model.