ترغب بنشر مسار تعليمي؟ اضغط هنا

A Dynamics Perspective of Pursuit-Evasion Games of Intelligent Agents with the Ability to Learn

137   0   0.0 ( 0 )
 نشر من قبل Hao Xiong
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Pursuit-evasion games are ubiquitous in nature and in an artificial world. In nature, pursuer(s) and evader(s) are intelligent agents that can learn from experience, and dynamics (i.e., Newtonian or Lagrangian) is vital for the pursuer and the evader in some scenarios. To this end, this paper addresses the pursuit-evasion game of intelligent agents from the perspective of dynamics. A bio-inspired dynamics formulation of a pursuit-evasion game and baseline pursuit and evasion strategies are introduced at first. Then, reinforcement learning techniques are used to mimic the ability of intelligent agents to learn from experience. Based on the dynamics formulation and reinforcement learning techniques, the effects of improving both pursuit and evasion strategies based on experience on pursuit-evasion games are investigated at two levels 1) individual runs and 2) ranges of the parameters of pursuit-evasion games. Results of the investigation are consistent with nature observations and the natural law - survival of the fittest. More importantly, with respect to the result of a pursuit-evasion game of agents with baseline strategies, this study achieves a different result. It is shown that, in a pursuit-evasion game with a dynamics formulation, an evader is not able to escape from a slightly faster pursuer with an effective learned pursuit strategy, based on agile maneuvers and an effective learned evasion strategy.

قيم البحث

اقرأ أيضاً

In this paper, a general nonlinear 1st-order consensus-based solution for distributed constrained convex optimization is considered for applications in network resource allocation. The proposed continuous-time solution is used to optimize continuousl y-differentiable strictly convex cost functions over weakly-connected undirected multi-agent networks. The solution is anytime feasible and models various nonlinearities to account for imperfections and constraints on the (physical model of) agents in terms of their limited actuation capabilities, e.g., quantization and saturation constraints among others. Moreover, different applications impose specific nonlinearities to the model, e.g., convergence in fixed/finite-time, robustness to uncertainties, and noise-tolerant dynamics. Our proposed distributed resource allocation protocol generalizes such nonlinear models. Putting convex set analysis together with the Lyapunov theorem, we provide a general technique to prove convergence (i) regardless of the particular type of nonlinearity (ii) with weak network-connectivity requirement (i.e., uniform-connectivity). We simulate the performance of the protocol in continuous-time coordination of generators, known as the economic dispatch problem (EDP).
Here we present a decomposition technique for a class of differential games. The technique consists in a decomposition of the target set which produces, for geometrical reasons, a decomposition in the dimensionality of the problem. Using some element s of Hamilton-Jacobi equations theory, we find a relation between the regularity of the solution and the possibility to decompose the problem. We use this technique to solve a pursuit evasion game with multiple agents.
This paper introduces two ongoing research projects which seek to apply computer modelling techniques in order to simulate human behaviour within organisations. Previous research in other disciplines has suggested that complex social behaviours are g overned by relatively simple rules which, when identified, can be used to accurately model such processes using computer technology. The broad objective of our research is to develop a similar capability within organisational psychology.
We present a numerical approach to finding optimal trajectories for players in a multi-body, asset-guarding game with nonlinear dynamics and non-convex constraints. Using the Iterative Best Response (IBR) scheme, we solve for each players optimal str ategy assuming the other players trajectories are known and fixed. Leveraging recent advances in Sequential Convex Programming (SCP), we use SCP as a subroutine within the IBR algorithm to efficiently solve an approximation of each players constrained trajectory optimization problem. We apply the approach to an asset-guarding game example involving multiple pursuers and a single evader (i.e., n-versus-1 engagements). Resulting evader trajectories are tested in simulation to verify successful evasion against pursuers using conventional intercept guidance laws.
While the topic of mean-field games (MFGs) has a relatively long history, heretofore there has been limited work concerning algorithms for the computation of equilibrium control policies. In this paper, we develop a computable policy iteration algori thm for approximating the mean-field equilibrium in linear-quadratic MFGs with discounted cost. Given the mean-field, each agent faces a linear-quadratic tracking problem, the solution of which involves a dynamical system evolving in retrograde time. This makes the development of forward-in-time algorithm updates challenging. By identifying a structural property of the mean-field update operator, namely that it preserves sequences of a particular form, we develop a forward-in-time equilibrium computation algorithm. Bounds that quantify the accuracy of the computed mean-field equilibrium as a function of the algorithms stopping condition are provided. The optimality of the computed equilibrium is validated numerically. In contrast to the most recent/concurrent results, our algorithm appears to be the first to study infinite-horizon MFGs with non-stationary mean-field equilibria, though with focus on the linear quadratic setting.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا