ترغب بنشر مسار تعليمي؟ اضغط هنا

Solving Structured Hierarchical Games Using Differential Backward Induction

397   0   0.0 ( 0 )
 نشر من قبل Shahin Jabbari
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Many real-world systems possess a hierarchical structure where a strategic plan is forwarded and implemented in a top-down manner. Examples include business activities in large companies or policy making for reducing the spread during pandemics. We introduce a novel class of games that we call structured hierarchical games (SHGs) to capture these strategic interactions. In an SHG, each player is represented as a vertex in a multi-layer decision tree and controls a real-valued action vector reacting to orders from its predecessors and influencing its descendants behaviors strategically based on its own subjective utility. SHGs generalize extensive form games as well as Stackelberg games. For general SHGs with (possibly) nonconvex payoffs and high-dimensional action spaces, we propose a new solution concept which we call local subgame perfect equilibrium. By exploiting the hierarchical structure and strategic dependencies in payoffs, we derive a back propagation-style gradient-based algorithm which we call Differential Backward Induction to compute an equilibrium. We theoretically characterize the convergence properties of DBI and empirically demonstrate a large overlap between the stable points reached by DBI and equilibrium solutions. Finally, we demonstrate the effectiveness of our algorithm in finding emph{globally} stable solutions and its scalability for a recently introduced class of SHGs for pandemic policy making.

قيم البحث

اقرأ أيضاً

This article extends the idea of solving parity games by strategy iteration to non-deterministic strategies: In a non-deterministic strategy a player restricts himself to some non-empty subset of possible actions at a given node, instead of limiting himself to exactly one action. We show that a strategy-improvement algorithm by by Bjoerklund, Sandberg, and Vorobyov can easily be adapted to the more general setting of non-deterministic strategies. Further, we show that applying the heuristic of all profitable switches leads to choosing a locally optimal successor strategy in the setting of non-deterministic strategies, thereby obtaining an easy proof of an algorithm by Schewe. In contrast to the algorithm by Bjoerklund et al., we present our algorithm directly for parity games which allows us to compare it to the algorithm by Jurdzinski and Voege: We show that the valuations used in both algorithm coincide on parity game arenas in which one player can surrender. Thus, our algorithm can also be seen as a generalization of the one by Jurdzinski and Voege to non-deterministic strategies. Finally, using non-deterministic strategies allows us to show that the number of improvement steps is bound from above by O(1.724^n). For strategy-improvement algorithms, this bound was previously only known to be attainable by using randomization.
Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time windows. To address this, we propose an online threat screening model in which screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem with a hard bound on risk, we formulate it as a Reinforcement Learning (RL) problem with constraints on the action space (hard bound on risk). We provide a novel way to efficiently enforce linear inequality constraints on the action output in Deep Reinforcement Learning. We show that our solution allows us to significantly reduce screenee wait time while guaranteeing a bound on risk.
119 - Hugo Gimbert 2009
Simple stochastic games are two-player zero-sum stochastic games with turn-based moves, perfect information, and reachability winning conditions. We present two new algorithms computing the values of simple stochastic games. Both of them rely on the existence of optimal permutation strategies, a class of positional strategies derived from permutations of the random vertices. The permutation-enumeration algorithm performs an exhaustive search among these strategies, while the permutation-improvement algorithm is based on successive improvements, `a la Hoffman-Karp. Our algorithms improve previously known algorithms in several aspects. First they run in polynomial time when the number of random vertices is fixed, so the problem of solving simple stochastic games is fixed-parameter tractable when the parameter is the number of random vertices. Furthermore, our algorithms do not require the input game to be transformed into a stopping game. Finally, the permutation-enumeration algorithm does not use linear programming, while the permutation-improvement algorithm may run in polynomial time.
Zielonkas classic recursive algorithm for solving parity games is perhaps the simplest among the many existing parity game algorithms. However, its complexity is exponential, while currently the state-of-the-art algorithms have quasipolynomial comple xity. Here, we present a modification of Zielonkas classic algorithm that brings its complexity down to $n^{mathcal{O}left(logleft(1+frac{d}{log n}right)right)}$, for parity games of size $n$ with $d$ priorities, in line with previous quasipolynomial-time solutions.
In a mean-payoff parity game, one of the two players aims both to achieve a qualitative parity objective and to minimize a quantitative long-term average of payoffs (aka. mean payoff). The game is zero-sum and hence the aim of the other player is to either foil the parity objective or to maximize the mean payoff. Our main technical result is a pseudo-quasi-polynomial algorithm for solving mean-payoff parity games. All algorithms for the problem that have been developed for over a decade have a pseudo-polynomial and an exponential factors in their running times; in the running time of our algorithm the latter is replaced with a quasi-polynomial one. By the results of Chatterjee and Doyen (2012) and of Schewe, Weinert, and Zimmermann (2018), our main technical result implies that there are pseudo-quasi-polynomial algorithms for solving parity energy games and for solving parity games with weights. Our main conceptual contributions are the definitions of strategy decompositions for both players, and a notion of progress measures for mean-payoff parity games that generalizes both parity and energy progress measures. The former provides normal forms for and succinct representations of winning strategies, and the latter enables the application to mean-payoff parity games of the order-theoretic machinery that underpins a recent quasi-polynomial algorithm for solving parity games.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا