ترغب بنشر مسار تعليمي؟ اضغط هنا

A comparison of the notions of optimality in soft constraints and graphical games

158   0   0.0 ( 0 )
 نشر من قبل Krzysztof R. Apt
 تاريخ النشر 2008
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The notion of optimality naturally arises in many areas of applied mathematics and computer science concerned with decision making. Here we consider this notion in the context of two formalisms used for different purposes and in different research areas: graphical games and soft constraints. We relate the notion of optimality used in the area of soft constraint satisfaction problems (SCSPs) to that used in graphical games, showing that for a large class of SCSPs that includes weighted constraints every optimal solution corresponds to a Nash equilibrium that is also a Pareto efficient joint strategy.


قيم البحث

اقرأ أيضاً

The notion of optimality naturally arises in many areas of applied mathematics and computer science concerned with decision making. Here we consider this notion in the context of three formalisms used for different purposes in reasoning about multi-a gent systems: strategic games, CP-nets, and soft constraints. To relate the notions of optimality in these formalisms we introduce a natural qualitative modification of the notion of a strategic game. We show then that the optimal outcomes of a CP-net are exactly the Nash equilibria of such games. This allows us to use the techniques of game theory to search for optimal outcomes of CP-nets and vice-versa, to use techniques developed for CP-nets to search for Nash equilibria of the considered games. Then, we relate the notion of optimality used in the area of soft constraints to that used in a generalization of strategic games, called graphical games. In particular we prove that for a natural class of soft constraints that includes weighted constraints every optimal solution is both a Nash equilibrium and Pareto efficient joint strategy. For a natural mapping in the other direction we show that Pareto efficient joint strategies coincide with the optimal solutions of soft constraints.
63 - Adrian Hutter 2020
We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents decision-making is transparent to each other, which allows each agent to predi ct how their opponent will play against them. To prevent an infinite regress of both agents recursively predicting each other indefinitely, each agent is required to give an opponent-independent response with some probability at least epsilon. Transparency also allows each agent to anticipate and shape the other agents gradient step, i.e. to move to regions of parameter space in which the opponents gradient points in a direction favourable to them. We study the resulting dynamics experimentally, using two algorithms from previous literature (LOLA and SOS) for opponent-aware learning. We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoners dilemma. In a game of chicken, in which both agents try to manoeuvre their opponent towards their preferred equilibrium, converging to a mutually beneficial outcome turns out to be much harder, and opponent-aware learning can even lead to worst-case outcomes for both agents. This highlights the need to develop opponent-aware learning algorithms that achieve acceptable outcomes in social dilemmas involving an equilibrium selection problem.
We study an information-structure design problem (a.k.a. persuasion) with a single sender and multiple receivers with actions of a priori unknown types, independently drawn from action-specific marginal distributions. As in the standard Bayesian pers uasion model, the sender has access to additional information regarding the action types, which she can exploit when committing to a (noisy) signaling scheme through which she sends a private signal to each receiver. The novelty of our model is in considering the case where the receivers interact in a sequential game with imperfect information, with utilities depending on the game outcome and the realized action types. After formalizing the notions of ex ante and ex interim persuasiveness (which differ in the time at which the receivers commit to following the senders signaling scheme), we investigate the continuous optimization problem of computing a signaling scheme which maximizes the senders expected revenue. We show that computing an optimal ex ante persuasive signaling scheme is NP-hard when there are three or more receivers. In contrast with previous hardness results for ex interim persuasion, we show that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to a novel algorithm based on the ellipsoid method which we propose.
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.
115 - Santiago Ontanon 2012
Game tree search algorithms such as minimax have been used with enormous success in turn-based adversarial games such as Chess or Checkers. However, such algorithms cannot be directly applied to real-time strategy (RTS) games because a number of reas ons. For example, minimax assumes a turn-taking game mechanics, not present in RTS games. In this paper we present RTMM, a real-time variant of the standard minimax algorithm, and discuss its applicability in the context of RTS games. We discuss its strengths and weaknesses, and evaluate it in two real-time games.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا