No Arabic abstract
This paper addresses tracking of a moving target in a multi-agent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each time induces a global time-varying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the non-asymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multi-agent tracking with nonlinear observations.
We consider the optimal coverage problem where a multi-agent network is deployed in an environment with obstacles to maximize a joint event detection probability. The objective function of this problem is non-convex and no global optimum is guaranteed by gradient-based algorithms developed to date. We first show that the objective function is monotone submodular, a class of functions for which a simple greedy algorithm is known to be within 0.63 of the optimal solution. We then derive two tighter lower bounds by exploiting the curvature information (total curvature and elemental curvature) of the objective function. We further show that the tightness of these lower bounds is complementary with respect to the sensing capabilities of the agents. The greedy algorithm solution can be subsequently used as an initial point for a gradient-based algorithm to obtain solutions even closer to the global optimum. Simulation results show that this approach leads to significantly better performance relative to previously used algorithms.
Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model. We propose different combinations of inference procedures and scoring models able to represent coordination patterns of increasing complexity. The resulting assignment policy can be efficiently learned on small problem instances and readily reused in problems with more agents and tasks (i.e., zero-shot generalization). We report experimental results on a toy search and rescue problem and on several target selection scenarios in StarCraft: Brood War, in which our model significantly outperforms strong rule-based baselines on instances with 5 times more agents and tasks than those seen during training.
In this paper, we extend the results from Jiao et al. (2019) on distributed linear quadratic control for leaderless multi-agent systems to the case of distributed linear quadratic tracking control for leader-follower multi-agent systems. Given one autonomous leader and a number of homogeneous followers, we introduce an associated global quadratic cost functional. We assume that the leader shares its state information with at least one of the followers and the communication between the followers is represented by a connected simple undirected graph. Our objective is to design distributed control laws such that the controlled network reaches tracking consensus and, moreover, the associated cost is smaller than a given tolerance for all initial states bounded in norm by a given radius. We establish a centralized design method for computing such suboptimal control laws, involving the solution of a single Riccati inequality of dimension equal to the dimension of the local agent dynamics, and the smallest and the largest eigenvalue of a given positive definite matrix involving the underlying graph. The proposed design method is illustrated by a simulation example.
Autonomous exploration is an application of growing importance in robotics. A promising strategy is ergodic trajectory planning, whereby an agent spends in each area a fraction of time which is proportional to its probability information density function. In this paper, a decentralized ergodic multi-agent trajectory planning algorithm featuring limited communication constraints is proposed. The agents trajectories are designed by optimizing a weighted cost encompassing ergodicity, control energy and close-distance operation objectives. To solve the underlying optimal control problem, a second-order descent iterative method coupled with a projection operator in the form of an optimal feedback controller is used. Exhaustive numerical analyses show that the multi-agent solution allows a much more efficient exploration in terms of completion task time and control energy distribution by leveraging collaboration among agents.
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.