ترغب بنشر مسار تعليمي؟ اضغط هنا

Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games

312   0   0.0 ( 0 )
 نشر من قبل David Fridovich-Keil
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Many problems in robotics involve multiple decision making agents. To operate efficiently in such settings, a robot must reason about the impact of its decisions on the behavior of other agents. Differential games offer an expressive theoretical framework for formulating these types of multi-agent problems. Unfortunately, most numerical solution techniques scale poorly with state dimension and are rarely used in real-time applications. For this reason, it is common to predict the future decisions of other agents and solve the resulting decoupled, i.e., single-agent, optimal control problem. This decoupling neglects the underlying interactive nature of the problem; however, efficient solution techniques do exist for broad classes of optimal control problems. We take inspiration from one such technique, the iterative linear-quadratic regulator (ILQR), which solves repeated approximations with linear dynamics and quadratic costs. Similarly, our proposed algorithm solves repeated linear-quadratic games. We experimentally benchmark our algorithm in several examples with a variety of initial conditions and show that the resulting strategies exhibit complex interactive behavior. Our results indicate that our algorithm converges reliably and runs in real-time. In a three-player, 14-state simulated intersection problem, our algorithm initially converges in < 0.25s. Receding horizon invocations converge in < 50 ms in a hardware collision-avoidance test.



قيم البحث

اقرأ أيضاً

Iterative linear-quadratic (ILQ) methods are widely used in the nonlinear optimal control community. Recent work has applied similar methodology in the setting of multiplayer general-sum differential games. Here, ILQ methods are capable of finding lo cal equilibria in interactive motion planning problems in real-time. As in most iterative procedures, however, this approach can be sensitive to initial conditions and hyperparameter choices, which can result in poor computational performance or even unsafe trajectories. In this paper, we focus our attention on a broad class of dynamical systems which are feedback linearizable, and exploit this structure to improve both algorithmic reliability and runtime. We showcase our new algorithm in three distinct traffic scenarios, and observe that in practice our method converges significantly more often and more quickly than was possible without exploiting the feedback linearizable structure.
We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, w e require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.
We present a numerical approach to finding optimal trajectories for players in a multi-body, asset-guarding game with nonlinear dynamics and non-convex constraints. Using the Iterative Best Response (IBR) scheme, we solve for each players optimal str ategy assuming the other players trajectories are known and fixed. Leveraging recent advances in Sequential Convex Programming (SCP), we use SCP as a subroutine within the IBR algorithm to efficiently solve an approximation of each players constrained trajectory optimization problem. We apply the approach to an asset-guarding game example involving multiple pursuers and a single evader (i.e., n-versus-1 engagements). Resulting evader trajectories are tested in simulation to verify successful evasion against pursuers using conventional intercept guidance laws.
251 - Jingrui Sun 2020
The paper studies the open-loop saddle point and the open-loop lower and upper values, as well as their relationship for two-person zero-sum stochastic linear-quadratic (LQ, for short) differential games with deterministic coefficients. It derives a necessary condition for the finiteness of the open-loop lower and upper values and a sufficient condition for the existence of an open-loop saddle point. It turns out that under the sufficient condition, a strongly regular solution to the associated Riccati equation uniquely exists, in terms of which a closed-loop representation is further established for the open-loop saddle point. Examples are presented to show that the finiteness of the open-loop lower and upper values does not ensure the existence of an open-loop saddle point in general. But for the classical deterministic LQ game, these two issues are equivalent and both imply the solvability of the Riccati equation, for which an explicit representation of the solution is obtained.
While the topic of mean-field games (MFGs) has a relatively long history, heretofore there has been limited work concerning algorithms for the computation of equilibrium control policies. In this paper, we develop a computable policy iteration algori thm for approximating the mean-field equilibrium in linear-quadratic MFGs with discounted cost. Given the mean-field, each agent faces a linear-quadratic tracking problem, the solution of which involves a dynamical system evolving in retrograde time. This makes the development of forward-in-time algorithm updates challenging. By identifying a structural property of the mean-field update operator, namely that it preserves sequences of a particular form, we develop a forward-in-time equilibrium computation algorithm. Bounds that quantify the accuracy of the computed mean-field equilibrium as a function of the algorithms stopping condition are provided. The optimality of the computed equilibrium is validated numerically. In contrast to the most recent/concurrent results, our algorithm appears to be the first to study infinite-horizon MFGs with non-stationary mean-field equilibria, though with focus on the linear quadratic setting.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا