ترغب بنشر مسار تعليمي؟ اضغط هنا

Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis

95   0   0.0 ( 0 )
 نشر من قبل Yongzhao Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In empirical game-theoretic analysis (EGTA), game models are extended iteratively through a process of generating new strategies based on learning from experience with prior strategies. The strategy exploration problem in EGTA is how to direct this process so to construct effective models with minimal iteration. A variety of approaches have been proposed in the literature, including methods based on classic techniques and novel concepts. Comparing the performance of these alternatives can be surprisingly subtle, depending sensitively on criteria adopted and measures employed. We investigate some of the methodological considerations in evaluating strategy exploration, defining key distinctions and identifying a few general principles based on examples and experimental observations. In particular, we emphasize the fact that empirical games create a space of strategies that should be evaluated as a whole. Based on this fact, we suggest that the minimum regret constrained profile (MRCP) provides a particularly robust basis for evaluating a space of strategies, and propose a local search method for MRCP that outperforms previous approaches. However, the computation of MRCP is not always feasible especially in large games. In this scenario, we highlight consistency considerations for comparing across different approaches. Surprisingly, we find that recent works violate these considerations that are necessary for evaluation, which may result in misleading conclusions on the performance of different approaches. For proper evaluation, we propose a new evaluation scheme and demonstrate that our scheme can reveal the true learning performance of different approaches compared to previous evaluation methods.



قيم البحث

اقرأ أيضاً

189 - Ying Wen , Hui Chen , Yaodong Yang 2021
Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration. Nonetheless, when applied in multi-agent settings, the guarantee of trust region meth ods no longer holds because an agents payoff is also affected by other agents adaptive behaviors. To tackle this problem, we conduct a game-theoretical analysis in the policy space, and propose a multi-agent trust region learning method (MATRL), which enables trust region optimization for multi-agent learning. Specifically, MATRL finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level. We derive the monotonic improvement guarantee in multi-agent settings and empirically show the local convergence of MATRL to stable fixed points in the two-player rotational differential game. To test our method, we evaluate MATRL in both discrete and continuous multiplayer general-sum games including checker and switch grid worlds, multi-agent MuJoCo, and Atari games. Results suggest that MATRL significantly outperforms strong multi-agent reinforcement learning baselines.
Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (Deep RL). At each iteration, Deep RL is invoked to train a b est response to a mixture of opponent policies. The repeated application of Deep RL poses an expensive computational burden as we look to apply this algorithm to more complex domains. We introduce two variations of PSRO designed to reduce the amount of simulation required during Deep RL training. Both algorithms modify how PSRO adds new policies to the empirical game, based on learned responses to a single opponent policy. The first, Mixed-Oracles, transfers knowledge from previous iterations of Deep RL, requiring training only against the opponents newest policy. The second, Mixed-Opponents, constructs a pure-strategy opponent by mixing existing strategys action-value estimates, instead of their policies. Learning against a single policy mitigates variance in state outcomes that is induced by an unobserved distribution of opponents. We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.
124 - Hanzhou Wu , Xinpeng Zhang 2019
While many games were designed for steganography and robust watermarking, few focused on reversible watermarking. We present a two-encoder game related to the rate-distortion optimization of content-adaptive reversible watermarking. In the game, Alic e first hides a payload into a cover. Then, Bob hides another payload into the modified cover. The embedding strategy of Alice affects the embedding capacity of Bob. The embedding strategy of Bob may produce data-extraction errors to Alice. Both want to embed as many pure secret bits as possible, subjected to an upper-bounded distortion. We investigate non-cooperative game and cooperative game between Alice and Bob. When they cooperate with each other, one may consider them as a whole, i.e., an encoder uses a cover for data embedding with two times. When they do not cooperate with each other, the game corresponds to a separable system, i.e., both want to independently hide a payload within the cover, but recovering the cover may need cooperation. We find equilibrium strategies for both players under constraints.
An interaction system has a finite set of agents that interact pairwise, depending on the current state of the system. Symmetric decomposition of the matrix of interaction coefficients yields the representation of states by self-adjoint matrices and hence a spectral representation. As a result, cooperation systems, decision systems and quantum systems all become visible as manifestations of special interaction systems. The treatment of the theory is purely mathematical and does not require any special knowledge of physics. It is shown how standard notions in cooperative game theory arise naturally in this context. In particular, Fourier transformation of cooperative games becomes meaningful. Moreover, quantum games fall into this framework. Finally, a theory of Markov evolution of interaction states is presented that generalizes classical homogeneous Markov chains to the present context.
We consider how an agent should update her uncertainty when it is represented by a set P of probability distributions and the agent observes that a random variable X takes on value x, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two reasonable games that differ in what the bookie knows when he makes his choice. Anomalies that have been observed before, like time inconsistency, can be understood as arising because different games are being played, against bookies with different information. We characterize the important special cases in which the optimal decision rules according to the minimax criterion amount to either conditioning or simply ignoring the information. Finally, we consider the relationship between conditioning and calibration when uncertainty is described by sets of probabilities.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا