ترغب بنشر مسار تعليمي؟ اضغط هنا

Comparing the notions of optimality in CP-nets, strategic games and soft constraints

93   0   0.0 ( 0 )
 نشر من قبل Krzysztof R. Apt
 تاريخ النشر 2008
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The notion of optimality naturally arises in many areas of applied mathematics and computer science concerned with decision making. Here we consider this notion in the context of three formalisms used for different purposes in reasoning about multi-agent systems: strategic games, CP-nets, and soft constraints. To relate the notions of optimality in these formalisms we introduce a natural qualitative modification of the notion of a strategic game. We show then that the optimal outcomes of a CP-net are exactly the Nash equilibria of such games. This allows us to use the techniques of game theory to search for optimal outcomes of CP-nets and vice-versa, to use techniques developed for CP-nets to search for Nash equilibria of the considered games. Then, we relate the notion of optimality used in the area of soft constraints to that used in a generalization of strategic games, called graphical games. In particular we prove that for a natural class of soft constraints that includes weighted constraints every optimal solution is both a Nash equilibrium and Pareto efficient joint strategy. For a natural mapping in the other direction we show that Pareto efficient joint strategies coincide with the optimal solutions of soft constraints.



قيم البحث

اقرأ أيضاً

The notion of optimality naturally arises in many areas of applied mathematics and computer science concerned with decision making. Here we consider this notion in the context of two formalisms used for different purposes and in different research ar eas: graphical games and soft constraints. We relate the notion of optimality used in the area of soft constraint satisfaction problems (SCSPs) to that used in graphical games, showing that for a large class of SCSPs that includes weighted constraints every optimal solution corresponds to a Nash equilibrium that is also a Pareto efficient joint strategy.
Combinatorial preference aggregation has many applications in AI. Given the exponential nature of these preferences, compact representations are needed and ($m$)CP-nets are among the most studied ones. Sequential and global voting are two ways to agg regate preferences over CP-nets. In the former, preferences are aggregated feature-by-feature. Hence, when preferences have specific feature dependencies, sequential voting may exhibit voting paradoxes, i.e., it might select sub-optimal outcomes. To avoid paradoxes in sequential voting, one has often assumed the $mathcal{O}$-legality restriction, which imposes a shared topological order among all the CP-nets. On the contrary, in global voting, CP-nets are considered as a whole during preference aggregation. For this reason, global voting is immune from paradoxes, and there is no need to impose restrictions over the CP-nets topological structure. Sequential voting over $mathcal{O}$-legal CP-nets has extensively been investigated. On the other hand, global voting over non-$mathcal{O}$-legal CP-nets has not carefully been analyzed, despite it was stated in the literature that a theoretical comparison between global and sequential voting was promising and a precise complexity analysis for global voting has been asked for multiple times. In quite few works, very partial results on the complexity of global voting over CP-nets have been given. We start to fill this gap by carrying out a thorough complexity analysis of Pareto and majority global voting over not necessarily $mathcal{O}$-legal acyclic binary polynomially connected (m)CP-nets. We settle these problems in the polynomial hierarchy, and some of them in PTIME or LOGSPACE, whereas EXPTIME was the previously known upper bound for most of them. We show various tight lower bounds and matching upper bounds for problems that up to date did not have any explicit non-obvious lower bound.
63 - Adrian Hutter 2020
We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents decision-making is transparent to each other, which allows each agent to predi ct how their opponent will play against them. To prevent an infinite regress of both agents recursively predicting each other indefinitely, each agent is required to give an opponent-independent response with some probability at least epsilon. Transparency also allows each agent to anticipate and shape the other agents gradient step, i.e. to move to regions of parameter space in which the opponents gradient points in a direction favourable to them. We study the resulting dynamics experimentally, using two algorithms from previous literature (LOLA and SOS) for opponent-aware learning. We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoners dilemma. In a game of chicken, in which both agents try to manoeuvre their opponent towards their preferred equilibrium, converging to a mutually beneficial outcome turns out to be much harder, and opponent-aware learning can even lead to worst-case outcomes for both agents. This highlights the need to develop opponent-aware learning algorithms that achieve acceptable outcomes in social dilemmas involving an equilibrium selection problem.
We study an information-structure design problem (a.k.a. persuasion) with a single sender and multiple receivers with actions of a priori unknown types, independently drawn from action-specific marginal distributions. As in the standard Bayesian pers uasion model, the sender has access to additional information regarding the action types, which she can exploit when committing to a (noisy) signaling scheme through which she sends a private signal to each receiver. The novelty of our model is in considering the case where the receivers interact in a sequential game with imperfect information, with utilities depending on the game outcome and the realized action types. After formalizing the notions of ex ante and ex interim persuasiveness (which differ in the time at which the receivers commit to following the senders signaling scheme), we investigate the continuous optimization problem of computing a signaling scheme which maximizes the senders expected revenue. We show that computing an optimal ex ante persuasive signaling scheme is NP-hard when there are three or more receivers. In contrast with previous hardness results for ex interim persuasion, we show that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to a novel algorithm based on the ellipsoid method which we propose.
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا