ترغب بنشر مسار تعليمي؟ اضغط هنا

Discounting the Past in Stochastic Games

246   0   0.0 ( 0 )
 نشر من قبل Taylor Dohmen
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Stochastic games, introduced by Shapley, model adversarial interactions in stochastic environments where two players choose their moves to optimize a discounted-sum of rewards. In the traditional discounted reward semantics, long-term weights are geometrically attenuated based on the delay in their occurrence. We propose a temporally dual notion -- called past-discounting -- where agents have geometrically decaying memory of the rewards encountered during a play of the game. We study past-discounted weight sequences as rewards on stochastic game arenas and examine the corresponding stochastic games with discounted and mean payoff objectives. We dub these games forgetful discounted games and forgetful mean payoff games, respectively. We establish positional determinacy of these games and recover classical complexity results and a Tauberian theorem in the context of past discounted reward sequences.

قيم البحث

اقرأ أيضاً

In this paper we introduce a novel flow representation for finite games in strategic form. This representation allows us to develop a canonical direct sum decomposition of an arbitrary game into three components, which we refer to as the potential, h armonic and nonstrategic components. We analyze natural classes of games that are induced by this decomposition, and in particular, focus on games with no harmonic component and games with no potential component. We show that the first class corresponds to the well-known potential games. We refer to the second class of games as harmonic games, and study the structural and equilibrium properties of this new class of games. Intuitively, the potential component of a game captures interactions that can equivalently be represented as a common interest game, while the harmonic part represents the conflicts between the interests of the players. We make this intuition precise, by studying the properties of these two classes, and show that indeed they have quite distinct and remarkable characteristics. For instance, while finite potential games always have pure Nash equilibria, harmonic games generically never do. Moreover, we show that the nonstrategic component does not affect the equilibria of a game, but plays a fundamental role in their efficiency properties, thus decoupling the location of equilibria and their payoff-related properties. Exploiting the properties of the decomposition framework, we obtain explicit expressions for the projections of games onto the subspaces of potential and harmonic games. This enables an extension of the properties of potential and harmonic games to nearby games. We exemplify this point by showing that the set of approximate equilibria of an arbitrary game can be characterized through the equilibria of its projection onto the set of potential games.
The price of anarchy has become a standard measure of the efficiency of equilibria in games. Most of the literature in this area has focused on establishing worst-case bounds for specific classes of games, such as routing games or more general conges tion games. Recently, the price of anarchy in routing games has been studied as a function of the traffic demand, providing asymptotic results in light and heavy traffic. The aim of this paper is to study the price of anarchy in nonatomic routing games in the intermediate region of the demand. To achieve this goal, we begin by establishing some smoothness properties of Wardrop equilibria and social optima for general smooth costs. In the case of affine costs we show that the equilibrium is piecewise linear, with break points at the demand levels at which the set of active paths changes. We prove that the number of such break points is finite, although it can be exponential in the size of the network. Exploiting a scaling law between the equilibrium and the social optimum, we derive a similar behavior for the optimal flows. We then prove that in any interval between break points the price of anarchy is smooth and it is either monotone (decreasing or increasing) over the full interval, or it decreases up to a certain minimum point in the interior of the interval and increases afterwards. We deduce that for affine costs the maximum of the price of anarchy can only occur at the break points. For general costs we provide counterexamples showing that the set of break points is not always finite.
We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playi ng a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.
118 - Jan Bok 2018
Cooperative interval game is a cooperative game in which every coalition gets assigned some closed real interval. This models uncertainty about how much the members of a coalition get for cooperating together. In this paper we study convexity, core and the Shapley value of games with interval uncertainty. Our motivation to do so is twofold. First, we want to capture which properties are preserved when we generalize concepts from classical cooperative game theory to interval games. Second, since these generalizations can be done in different ways, mainly with regard to the resulting level of uncertainty, we try to compare them and show their relation to each other.
Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-p layer games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium with probability $1$, even if the delay between choosing an action and receiving the corresponding reward is unbounded.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا