No Arabic abstract
In most multiagent applications, communication is essential among agents to coordinate their actions, and thus achieve their goal. However, communication often has a related cost that affects overall system performance. In this paper, we draw inspiration from studies of epistemic planning to develop a communication model for agents that allows them to cooperate and make communication decisions effectively within a planning task. The proposed model treats a communication process as an action that modifies the epistemic state of the team. In two simulated tasks, we evaluate whether agents can cooperate effectively and achieve higher performance using communication protocol modeled in our epistemic planning framework. Based on an empirical study conducted using search and rescue tasks with different scenarios, our results show that the proposed model improved team performance across all scenarios compared with baseline models.
Multi-agent Markov Decision Processes (MMDPs) arise in a variety of applications including target tracking, control of multi-robot swarms, and multiplayer games. A key challenge in MMDPs occurs when the state and action spaces grow exponentially in the number of agents, making computation of an optimal policy computationally intractable for medium- to large-scale problems. One property that has been exploited to mitigate this complexity is transition independence, in which each agents transition probabilities are independent of the states and actions of other agents. Transition independence enables factorization of the MMDP and computation of local agent policies but does not hold for arbitrary MMDPs. In this paper, we propose an approximate transition dependence property, called $delta$-transition dependence and develop a metric for quantifying how far an MMDP deviates from transition independence. Our definition of $delta$-transition dependence recovers transition independence as a special case when $delta$ is zero. We develop a polynomial time algorithm in the number of agents that achieves a provable bound on the global optimum when the reward functions are monotone increasing and submodular in the agent actions. We evaluate our approach on two case studies, namely, multi-robot control and multi-agent patrolling example.
In future intelligent transportation systems, networked vehicles coordinate with each other to achieve safe operations based on an assumption that communications among vehicles and infrastructure are reliable. Traditional methods usually deal with the design of control systems and communication networks in a separated manner. However, control and communication systems are tightly coupled as the motions of vehicles will affect the overall communication quality. Hence, we are motivated to study the co-design of both control and communication systems. In particular, we propose a control theoretical framework for distributed motion planning for multi-agent systems which satisfies complex and high-level spatial and temporal specifications while accounting for communication quality at the same time. Towards this end, desired motion specifications and communication performances are formulated as signal temporal logic (STL) and spatial-temporal logic (SpaTeL) formulas, respectively. The specifications are encoded as constraints on system and environment state variables of mixed integer linear programs (MILP), and upon which control strategies satisfying both STL and SpaTeL specifications are generated for each agent by employing a distributed model predictive control (MPC) framework. Effectiveness of the proposed framework is validated by a simulation of distributed communication-aware motion planning for multi-agent systems.
We study the problem of minimizing the resource capacity of autonomous agents cooperating to achieve a shared task. More specifically, we consider high-level planning for a team of homogeneous agents that operate under resource constraints in stochastic environments and share a common goal: given a set of target locations, ensure that each location will be visited infinitely often by some agent almost surely. We formalize the dynamics of agents by consumption Markov decision processes. In a consumption Markov decision process, the agent has a resource of limited capacity. Each action of the agent may consume some amount of the resource. To avoid exhaustion, the agent can replenish its resource to full capacity in designated reload states. The resource capacity restricts the capabilities of the agent. The objective is to assign target locations to agents, and each agent is only responsible for visiting the assigned subset of target locations repeatedly. Moreover, the assignment must ensure that the agents can carry out their tasks with minimal resource capacity. We reduce the problem of finding target assignments for a team of agents with the lowest possible capacity to an equivalent graph-theoretical problem. We develop an algorithm that solves this graph problem in time that is emph{polynomial} in the number of agents, target locations, and size of the consumption Markov decision process. We demonstrate the applicability and scalability of the algorithm in a scenario where hundreds of unmanned underwater vehicles monitor hundreds of locations in environments with stochastic ocean currents.
In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents policies, which can recover agents policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at url{https://github.com/apexrl/CoDAIL}.
Agent technology, a new paradigm in software engineering, has received attention from research and industry since 1990s. However, it is still not used widely to date because it requires expertise on both programming and agent technology; gaps among requirements, agent design, and agent deployment also pose more difficulties. Goal Net methodology attempts to solve these issues with a goal-oriented approach that resembles human behaviours, and an agent designer that supports agent development using this philosophy. However, there are limitations on existing Goal Net Designer, the design and modelling component of the agent designer. Those limitations, including limited access, difficult deployment, inflexibility in user operations, design workflows against typical Goal Net methodology workflows, and lack of data protection, have inhibited widespread adoption of Goal Net methodology. Motivated by this, this book focuses on improvements on Goal Net Designer. In this project, Goal Net Designer is completely re-implemented using new technology with optimised software architecture and design. It allows access from all major desktop operating systems, as well as in web environment via all modern browsers. Enhancements such as refined workflows, model validation tool, access control, team collaboration tool, and link to compiler make Goal Net Designer a fully functional and powerful Integrated Development Environment. User friendliness and usability are greatly enhanced by simplifying users actions to accomplish their tasks. User behaviour logging and quantitative feedback channel are also included to allow Goal Net Designer to continuously evolve with the power of big data analytics in future. To evaluate the new Goal Net Designer, a teachable agent has been developed with the help of Goal Net Designer and the development process is illustrated in a case study.