No Arabic abstract
We present a numerical approach to finding optimal trajectories for players in a multi-body, asset-guarding game with nonlinear dynamics and non-convex constraints. Using the Iterative Best Response (IBR) scheme, we solve for each players optimal strategy assuming the other players trajectories are known and fixed. Leveraging recent advances in Sequential Convex Programming (SCP), we use SCP as a subroutine within the IBR algorithm to efficiently solve an approximation of each players constrained trajectory optimization problem. We apply the approach to an asset-guarding game example involving multiple pursuers and a single evader (i.e., n-versus-1 engagements). Resulting evader trajectories are tested in simulation to verify successful evasion against pursuers using conventional intercept guidance laws.
Many problems in robotics involve multiple decision making agents. To operate efficiently in such settings, a robot must reason about the impact of its decisions on the behavior of other agents. Differential games offer an expressive theoretical framework for formulating these types of multi-agent problems. Unfortunately, most numerical solution techniques scale poorly with state dimension and are rarely used in real-time applications. For this reason, it is common to predict the future decisions of other agents and solve the resulting decoupled, i.e., single-agent, optimal control problem. This decoupling neglects the underlying interactive nature of the problem; however, efficient solution techniques do exist for broad classes of optimal control problems. We take inspiration from one such technique, the iterative linear-quadratic regulator (ILQR), which solves repeated approximations with linear dynamics and quadratic costs. Similarly, our proposed algorithm solves repeated linear-quadratic games. We experimentally benchmark our algorithm in several examples with a variety of initial conditions and show that the resulting strategies exhibit complex interactive behavior. Our results indicate that our algorithm converges reliably and runs in real-time. In a three-player, 14-state simulated intersection problem, our algorithm initially converges in < 0.25s. Receding horizon invocations converge in < 50 ms in a hardware collision-avoidance test.
While the topic of mean-field games (MFGs) has a relatively long history, heretofore there has been limited work concerning algorithms for the computation of equilibrium control policies. In this paper, we develop a computable policy iteration algorithm for approximating the mean-field equilibrium in linear-quadratic MFGs with discounted cost. Given the mean-field, each agent faces a linear-quadratic tracking problem, the solution of which involves a dynamical system evolving in retrograde time. This makes the development of forward-in-time algorithm updates challenging. By identifying a structural property of the mean-field update operator, namely that it preserves sequences of a particular form, we develop a forward-in-time equilibrium computation algorithm. Bounds that quantify the accuracy of the computed mean-field equilibrium as a function of the algorithms stopping condition are provided. The optimality of the computed equilibrium is validated numerically. In contrast to the most recent/concurrent results, our algorithm appears to be the first to study infinite-horizon MFGs with non-stationary mean-field equilibria, though with focus on the linear quadratic setting.
This paper develops an efficient multi-agent deep reinforcement learning algorithm for cooperative controls in powergrids. Specifically, we consider the decentralized inverter-based secondary voltage control problem in distributed generators (DGs), which is first formulated as a cooperative multi-agent reinforcement learning (MARL) problem. We then propose a novel on-policy MARL algorithm, PowerNet, in which each agent (DG) learns a control policy based on (sub-)global reward but local states from its neighboring agents. Motivated by the fact that a local control from one agent has limited impact on agents distant from it, we exploit a novel spatial discount factor to reduce the effect from remote agents, to expedite the training process and improve scalability. Furthermore, a differentiable, learning-based communication protocol is employed to foster the collaborations among neighboring agents. In addition, to mitigate the effects of system uncertainty and random noise introduced during on-policy learning, we utilize an action smoothing factor to stabilize the policy execution. To facilitate training and evaluation, we develop PGSim, an efficient, high-fidelity powergrid simulation platform. Experimental results in two microgrid setups show that the developed PowerNet outperforms a conventional model-based control, as well as several state-of-the-art MARL algorithms. The decentralized learning scheme and high sample efficiency also make it viable to large-scale power grids.
Topology inference is a crucial problem for cooperative control in multi-agent systems. Different from most prior works, this paper is dedicated to inferring the directed network topology from the observations that consist of a single, noisy and finite time-series system trajectory, where the cooperation dynamics is stimulated with the initial network state and the unmeasurable latent input. The unmeasurable latent input refers to intrinsic system signal and extrinsic environment interference. Considering the time-invariant/varying nature of the input, we propose two-layer optimization-based and iterative estimation based topology inference algorithms (TO-TIA and IE-TIA), respectively. TO-TIA allows us to capture the separability of global agent state and eliminates the unknown influence of time-invariant input on system dynamics. IE-TIA further exploits the identifiability and estimability of more general time-varying input and provides an asymptotic solution with desired convergence properties, with higher computation cost compared with TO-TIA. Our novel algorithms relax the dependence of observation scale and leverage the empirical risk reformulation to improve the inference accuracy in terms of the topology structure and edge weight. Comprehensive theoretical analysis and simulations for various topologies are provided to illustrate the inference feasibility and the performance of the proposed algorithms.
Pursuit-evasion games are ubiquitous in nature and in an artificial world. In nature, pursuer(s) and evader(s) are intelligent agents that can learn from experience, and dynamics (i.e., Newtonian or Lagrangian) is vital for the pursuer and the evader in some scenarios. To this end, this paper addresses the pursuit-evasion game of intelligent agents from the perspective of dynamics. A bio-inspired dynamics formulation of a pursuit-evasion game and baseline pursuit and evasion strategies are introduced at first. Then, reinforcement learning techniques are used to mimic the ability of intelligent agents to learn from experience. Based on the dynamics formulation and reinforcement learning techniques, the effects of improving both pursuit and evasion strategies based on experience on pursuit-evasion games are investigated at two levels 1) individual runs and 2) ranges of the parameters of pursuit-evasion games. Results of the investigation are consistent with nature observations and the natural law - survival of the fittest. More importantly, with respect to the result of a pursuit-evasion game of agents with baseline strategies, this study achieves a different result. It is shown that, in a pursuit-evasion game with a dynamics formulation, an evader is not able to escape from a slightly faster pursuer with an effective learned pursuit strategy, based on agile maneuvers and an effective learned evasion strategy.