ترغب بنشر مسار تعليمي؟ اضغط هنا

Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

137   0   0.0 ( 0 )
 نشر من قبل Gabriele Farina
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the application of iterative first-order methods to the problem of computing equilibria of large-scale two-player extensive-form games. First-order methods must typically be instantiated with a regularizer that serves as a distance-generating function for the decision sets of the players. For the case of two-player zero-sum games, the state-of-the-art theoretical convergence rate for Nash equilibrium is achieved by using the dilated entropy function. In this paper, we introduce a new entropy-based distance-generating function for two-player zero-sum games, and show that this function achieves significantly better strong convexity properties than the dilated entropy, while maintaining the same easily-implemented closed-form proximal mapping. Extensive numerical simulations show that these superior theoretical properties translate into better numerical performance as well. We then generalize our new entropy distance function, as well as general dilated distance functions, to the scaled extension operator. The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games, as well as the convex polytopes corresponding to correlated and team equilibria. By instantiating first-order methods with our regularizers, we develop the first accelerated first-order methods for computing correlated equilibra and ex-ante coordinated team equilibria. Our methods have a guaranteed $1/T$ rate of convergence, along with linear-time proximal updates.

قيم البحث

اقرأ أيضاً

We investigate the computation of equilibria in extensive-form games where ex ante correlation is possible, focusing on correlated equilibria requiring the least amount of communication between the players and the mediator. Motivated by the hardness results on the computation of normal-form correlated equilibria, we introduce the notion of normal-form coarse correlated equilibrium, extending the definition of coarse correlated equilibrium to sequential games. We show that, in two-player games without chance moves, an optimal (e.g., social welfare maximizing) normal-form coarse correlated equilibrium can be computed in polynomial time, and that in general multi-player games (including two-player games with Chance), the problem is NP-hard. For the former case, we provide a polynomial-time algorithm based on the ellipsoid method and also propose a more practical one, which can be efficiently applied to problems of considerable size. Then, we discuss how our algorithm can be extended to games with Chance and games with more than two players.
Our paper concerns the computation of Nash equilibria of first-price auctions with correlated values. While there exist several equilibrium computation methods for auctions with independent values, the correlation of the bidders values introduces sig nificant complications that render existing methods unsatisfactory in practice. Our contribution is a step towards filling this gap: inspired by the seminal fictitious play process of Brown and Robinson, we present a learning heuristic-that we call fictitious bidding (FB)-for estimating Bayes-Nash equilibria of first-price auctions with correlated values, and we assess the performance of this heuristic on several relevant examples.
Despite the many recent practical and theoretical breakthroughs in computational game theory, equilibrium finding in extensive-form team games remains a significant challenge. While NP-hard in the worst case, there are provably efficient algorithms f or certain families of team game. In particular, if the game has common external information, also known as A-loss recall -- informally, actions played by non-team members (i.e., the opposing team or nature) are either unknown to the entire team, or common knowledge within the team -- then polynomial-time algorithms exist (Kaneko and Kline, 1995). In this paper, we devise a completely new algorithm for solving team games. It uses a tree decomposition of the constraint system representing each teams strategy to reduce the number and degree of constraints required for correctness (tightness of the mathematical program). Our algorithm reduces the problem of solving team games to a linear program with at most $NW^{w+O(1)}$ nonzero entries in the constraint matrix, where $N$ is the size of the game tree, $w$ is a parameter that depends on the amount of uncommon external information, and $W$ is the treewidth of the tree decomposition. In public-action games, our program size is bounded by the tighter $tilde O(3^t 2^{t(n-1)}NW)$ for teams of $n$ players with $t$ types each. Since our algorithm describes the polytope of correlated strategies directly, we get equilibrium finding in correlated strategies for free -- instead of, say, having to run a double oracle algorithm. We show via experiments on a standard suite of games that our algorithm achieves state-of-the-art performance on all benchmark game classes except one. We also present, to our knowledge, the first experiments for this setting where more than one team has more than one member.
Graphical games are a useful framework for modeling the interactions of (selfish) agents who are connected via an underlying topology and whose behaviors influence each other. They have wide applications ranging from computer science to economics and biology. Yet, even though a players payoff only depends on the actions of their direct neighbors in graphical games, computing the Nash equilibria and making statements about the convergence time of natural local dynamics in particular can be highly challenging. In this work, we present a novel approach for classifying complexity of Nash equilibria in graphical games by establishing a connection to local graph algorithms, a subfield of distributed computing. In particular, we make the observation that the equilibria of graphical games are equivalent to locally verifiable labelings (LVL) in graphs; vertex labelings which are verifiable with a constant-round local algorithm. This connection allows us to derive novel lower bounds on the convergence time to equilibrium of best-response dynamics in graphical games. Since we establish that distributed convergence can sometimes be provably slow, we also introduce and give bounds on an intuitive notion of time-constrained inefficiency of best responses. We exemplify how our results can be used in the implementation of mechanisms that ensure convergence of best responses to a Nash equilibrium. Our results thus also give insight into the convergence of strategy-proof algorithms for graphical games, which is still not well understood.
Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at pro ducing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا