ترغب بنشر مسار تعليمي؟ اضغط هنا

Combining local search techniques and path following for bimatrix games

67   0   0.0 ( 0 )
 نشر من قبل Nicola Gatti
 تاريخ النشر 2012
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Computing a Nash equilibrium (NE) is a central task in computer science. An NE is a particularly appropriate solution concept for two-agent settings because coalitional deviations are not an issue. However, even in this case, finding an NE is PPAD-complete. In this paper, we combine path following algorithms with local search techniques to design new algorithms for finding exact and approximate NEs. We show that our algorithms largely outperform the state of the art and that almost all the known benchmark game classes are easily solvable or approximable (except for the GAMUT CovariantGameRand class).

قيم البحث

اقرأ أيضاً

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prio r algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas holdem poker, while using far less domain knowledge than any prior poker AI.
Computing Nash equilibrium in bimatrix games is PPAD-hard, and many works have focused on the approximate solutions. When games are generated from a fixed unknown distribution, learning a Nash predictor via data-driven approaches can be preferable. I n this paper, we study the learnability of approximate Nash equilibrium in bimatrix games. We prove that Lipschitz function class is agnostic Probably Approximately Correct (PAC) learnable with respect to Nash approximation loss. Additionally, to demonstrate the advantages of learning a Nash predictor, we develop a model that can efficiently approximate solutions for games under the same distribution. We show by experiments that the solutions from our Nash predictor can serve as effective initializing points for other Nash solvers.
We study strategic games on weighted directed graphs, where the payoff of a player is defined as the sum of the weights on the edges from players who chose the same strategy augmented by a fixed non-negative bonus for picking a given strategy. These games capture the idea of coordination in the absence of globally common strategies. Prior work shows that the problem of determining the existence of a pure Nash equilibrium for these games is NP-complete already for graphs with all weights equal to one and no bonuses. However, for several classes of graphs (e.g. DAGs and cliques) pure Nash equilibria or even strong equilibria always exist and can be found by simply following a particular improvement or coalition-improvement path, respectively. In this paper we identify several natural classes of graphs for which a finite improvement or coalition-improvement path of polynomial length always exists, and, as a consequence, a Nash equilibrium or strong equilibrium in them can be found in polynomial time. We also argue that these results are optimal in the sense that in natural generalisations of these classes of graphs, a pure Nash equilibrium may not even exist.
105 - Bahman Kalantari , Chun Lau 2018
Extensive study on the complexity of computing Nash Equilibrium has resulted in the definition of the complexity class PPAD by Papadimitriou cite{Papa2}, Subsequently shown to be PPAD-complete, first by Daskalakis, Goldberg, and Papadimitriou cite{Pa pa} for $3$ or more and even for the bimatrix case by Chen and Deng cite{Chen}. On the other hand, it is well known that Nash equilibria of games with smooth payoff functions are generally Pareto-inefficient cite{Dubey} In the spirit of Von Neumanns Minimax Theorem and its polynomial-time solvability via Linear Programming, Kalantari cite{Kalantari} has described a multilinear minimax relaxation (MMR) that provides an approximation to a convex combination of expected payoffs in any Nash Equilibrium via LP. In this paper, we study this relaxation for the bimatrix game, solving its corresponding LP formulation and comparing its solution to the solution computed by the Lemke-Howson algorithm. We also give a game theoretic interpretation of MMR for the bimatrix game involving a meta-player. Our relaxation has the following theoretical advantages: (1) It can be computed in polynomial time; (2) For at least one player, the computed MMR payoff is at least as good any Nash Equilibrium payoff; (3) There exists a convex scaling of the payoff matrices giving equal payoffs. Such a solution is a satisfactory compromise. Computationally, we have compared our approach with the state-of-the-art implementation of the Lemke-Howson algorithm cite{Lemke}. We have observed the following advantages: (i) MMR outperformed Lemke-Howson in time complexity; (ii) In about $80%$ of the cases the MMR payoffs for both players are better than any Nash Equilibria; (iii) in the remaining $20%$, while one players payoff is better than any Nash Equilibrium payoff, the other players payoff is only within a relative error of $17%$.
Coordination games describe social or economic interactions in which the adoption of a common strategy has a higher payoff. They are classically used to model the spread of conventions, behaviors, and technologies in societies. Here we consider a two -strategies coordination game played asynchronously between the nodes of a network. Agents behave according to a noisy best-response dynamics. It is known that noise removes the degeneracy among equilibria: In the long run, the ``risk-dominant behavior spreads throughout the network. Here we consider the problem of computing the typical time scale for the spread of this behavior. In particular, we study its dependence on the network structure and derive a dichotomy between highly-connected, non-local graphs that show slow convergence, and poorly connected, low dimensional graphs that show fast convergence.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا