Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Approximate exploitability: Learning a best response in large games

105 0 0.0 ( 0 )

Download Cite

Added by Finbarr Timbers

Publication date 2020

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Finbarr Timbers - Edward Lockhart - Marc Lanctot

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

A standard metric used to measure the approximate optimality of policies in imperfect information games is exploitability, i.e. the performance of a policy against its worst-case opponent. However, exploitability is intractable to compute in large games as it requires a full traversal of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric using an approximate best response; the approximation is done by using search and reinforcement learning. This is a generalization of local best response, a domain specific evaluation metric used in poker. We provide empirical results for a specific instance of the method, demonstrating that our method converges to exploitability in the tabular and function approximation settings for small games. In large games, our method learns to exploit both strong and weak agents, learning to exploit an AlphaZero agent.

rate research

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

434 - Edward Lockhart , Marc Lanctot , Julien Perolat 2019

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a players strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms.

Artificial Intelligence Computer Science and Game Theory Machine Learning

Approximate Guarantees for Dictionary Learning

295 - Aditya Bhaskara , Wai Ming Tai 2019

In the dictionary learning (or sparse coding) problem, we are given a collection of signals (vectors in $mathbb{R}^d$), and the goal is to find a basis in which the signals have a sparse (approximate) representation. The problem has received a lot of attention in signal processing, learning, and theoretical computer science. The problem is formalized as factorizing a matrix $X (d times n)$ (whose columns are the signals) as $X = AY$, where $A$ has a prescribed number $m$ of columns (typically $m ll n$), and $Y$ has columns that are $k$-sparse (typically $k ll d$). Most of the known theoretical results involve assuming that the columns of the unknown $A$ have certain incoherence properties, and that the coefficient matrix $Y$ has random (or partly random) structure. The goal of our work is to understand what can be said in the absence of such assumptions. Can we still find $A$ and $Y$ such that $X approx AY$? We show that this is possible, if we allow violating the bounds on $m$ and $k$ by appropriate factors that depend on $k$ and the desired approximation. Our results rely on an algorithm for what we call the threshold correlation problem, which turns out to be related to hypercontractive norms of matrices. We also show that our algorithmic ideas apply to a setting in which some of the columns of $X$ are outliers, thus giving similar guarantees even in this challenging setting.

Machine Learning Machine Learning

Bayesian Learning in Undirected Graphical Models: Approximate MCMC algorithms

357 - Iain Murray , Zoubin Ghahramani 2012

Bayesian learning in undirected graphical models|computing posterior distributions over parameters and predictive quantities is exceptionally difficult. We conjecture that for general undirected models, there are no tractable MCMC (Markov Chain Monte Carlo) schemes giving the correct equilibrium distribution over parameters. While this intractability, due to the partition function, is familiar to those performing parameter optimisation, Bayesian learning of posterior distributions over undirected model parameters has been unexplored and poses novel challenges. we propose several approximate MCMC schemes and test on fully observed binary models (Boltzmann machines) for a small coronary heart disease data set and larger artificial systems. While approximations must perform well on the model, their interaction with the sampling scheme is also important. Samplers based on variational mean- field approximations generally performed poorly, more advanced methods using loopy propagation, brief sampling and stochastic dynamics lead to acceptable parameter posteriors. Finally, we demonstrate these techniques on a Markov random field with hidden variables.

Machine Learning Machine Learning

Iterative Best Response for Multi-Body Asset-Guarding Games

136 - Emmanuel Sin , Murat Arcak , Douglas Philbrick 2020

We present a numerical approach to finding optimal trajectories for players in a multi-body, asset-guarding game with nonlinear dynamics and non-convex constraints. Using the Iterative Best Response (IBR) scheme, we solve for each players optimal strategy assuming the other players trajectories are known and fixed. Leveraging recent advances in Sequential Convex Programming (SCP), we use SCP as a subroutine within the IBR algorithm to efficiently solve an approximation of each players constrained trajectory optimization problem. We apply the approach to an asset-guarding game example involving multiple pursuers and a single evader (i.e., n-versus-1 engagements). Resulting evader trajectories are tested in simulation to verify successful evasion against pursuers using conventional intercept guidance laws.

Systems and Control Multiagent Systems Systems and Control

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

226 - Thomas Anthony , Tom Eccles , Andrea Tacchetti 2020

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

Machine Learning Artificial Intelligence Computer Science and Game Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Approximate exploitability: Learning a best response in large games

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions