ﻻ يوجد ملخص باللغة العربية
Stochastic games, introduced by Shapley, model adversarial interactions in stochastic environments where two players choose their moves to optimize a discounted-sum of rewards. In the traditional discounted reward semantics, long-term weights are geometrically attenuated based on the delay in their occurrence. We propose a temporally dual notion -- called past-discounting -- where agents have geometrically decaying memory of the rewards encountered during a play of the game. We study past-discounted weight sequences as rewards on stochastic game arenas and examine the corresponding stochastic games with discounted and mean payoff objectives. We dub these games forgetful discounted games and forgetful mean payoff games, respectively. We establish positional determinacy of these games and recover classical complexity results and a Tauberian theorem in the context of past discounted reward sequences.
In this paper we introduce a novel flow representation for finite games in strategic form. This representation allows us to develop a canonical direct sum decomposition of an arbitrary game into three components, which we refer to as the potential, h
The price of anarchy has become a standard measure of the efficiency of equilibria in games. Most of the literature in this area has focused on establishing worst-case bounds for specific classes of games, such as routing games or more general conges
We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playi
Cooperative interval game is a cooperative game in which every coalition gets assigned some closed real interval. This models uncertainty about how much the members of a coalition get for cooperating together. In this paper we study convexity, core
Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-p