No Arabic abstract
We introduce the class of pay or play games, which captures scenarios in which each decision maker is faced with a choice between two actions: one with a fixed payoff and an- other with a payoff dependent on others selected actions. This is, arguably, the simplest setting that models selection among certain and uncertain outcomes in a multi-agent system. We study the properties of equilibria in such games from both a game-theoretic perspective and a computational perspective. Our main positive result establishes the existence of a semi-strong equilibrium in every such game. We show that although simple, pay of play games contain a large variety of well-studied environments, e.g., vaccination games. We discuss the interesting implications of our results for these environments.
We study the strategic implications that arise from adding one extra option to the miners participating in the bitcoin protocol. We propose that when adding a block, miners also have the ability to pay forward an amount to be collected by the first miner who successfully extends their branch, giving them the power to influence the incentives for mining. We formulate a stochastic game for the study of such incentives and show that with this added option, smaller miners can guarantee that the best response of even substantially more powerful miners is to follow the expected behavior intended by the protocol designer.
Pay-per-last-$N$-shares (PPLNS) is one of the most common payout strategies used by mining pools in Proof-of-Work (PoW) cryptocurrencies. As with any payment scheme, it is imperative to study issues of incentive compatibility of miners within the pool. For PPLNS this question has only been partially answered; we know that reasonably-sized miners within a PPLNS pool prefer following the pool protocol over employing specific deviations. In this paper, we present a novel modification to PPLNS where we randomise the protocol in a natural way. We call our protocol Randomised pay-per-last-$N$-shares (RPPLNS), and note that the randomised structure of the protocol greatly simplifies the study of its incentive compatibility. We show that RPPLNS maintains the strengths of PPLNS (i.e., fairness, variance reduction, and resistance to pool hopping), while also being robust against a richer class of strategic mining than what has been shown for PPLNS.
Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.
We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.
A dynamic agent model is introduced with an annual random wealth multiplicative process followed by taxes paid according to a linear wealth-dependent tax rate. If poor agents pay higher tax rates than rich agents, eventually all wealth becomes concentrated in the hands of a single agent. By contrast, if poor agents are subject to lower tax rates, the economic collective process continues forever.