ترغب بنشر مسار تعليمي؟ اضغط هنا

From Weak Learning to Strong Learning in Fictitious Play Type Algorithms

127   0   0.0 ( 0 )
 نشر من قبل Soummya Kar
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The paper studies the highly prototypical Fictitious Play (FP) algorithm, as well as a broad class of learning processes based on best-response dynamics, that we refer to as FP-type algorithms. A well-known shortcoming of FP is that, while players may learn an equilibrium strategy in some abstract sense, there are no guarantees that the period-by-period strategies generated by the algorithm actually converge to equilibrium themselves. This issue is fundamentally related to the discontinuous nature of the best response correspondence and is inherited by many FP-type algorithms. Not only does it cause problems in the interpretation of such algorithms as a mechanism for economic and social learning, but it also greatly diminishes the practical value of these algorithms for use in distributed control. We refer to forms of learning in which players learn equilibria in some abstract sense only (to be defined more precisely in the paper) as weak learning, and we refer to forms of learning where players period-by-period strategies converge to equilibrium as strong learning. An approach is presented for modifying an FP-type algorithm that achieves weak learning in order to construct a variant that achieves strong learning. Theoretical convergence results are proved.



قيم البحث

اقرأ أيضاً

128 - Brian Swenson , Soummya Kar , 2013
The paper is concerned with distributed learning in large-scale games. The well-known fictitious play (FP) algorithm is addressed, which, despite theoretical convergence results, might be impractical to implement in large-scale settings due to intens e computation and communication requirements. An adaptation of the FP algorithm, designated as the empirical centroid fictitious play (ECFP), is presented. In ECFP players respond to the centroid of all players actions rather than track and respond to the individual actions of every player. Convergence of the ECFP algorithm in terms of average empirical frequency (a notion made precise in the paper) to a subset of the Nash equilibria is proven under the assumption that the game is a potential game with permutation invariant potential function. A more general formulation of ECFP is then given (which subsumes FP as a special case) and convergence results are given for the class of potential games. Furthermore, a distributed formulation of the ECFP algorithm is presented, in which, players endowed with a (possibly sparse) preassigned communication graph, engage in local, non-strategic information exchange to eventually agree on a common equilibrium. Convergence results are proven for the distributed ECFP algorithm.
175 - B. Swenson , S. Kar , 2015
The paper is concerned with distributed learning and optimization in large-scale settings. The well-known Fictitious Play (FP) algorithm has been shown to achieve Nash equilibrium learning in certain classes of multi-agent games. However, FP can be c omputationally difficult to implement when the number of players is large. Sampled FP is a variant of FP that mitigates the computational difficulties arising in FP by using a Monte-Carlo (i.e., sampling-based) approach. The Sampled FP algorithm has been studied both as a tool for distributed learning and as an optimization heuristic for large-scale problems. Despite its computational advantages, a shortcoming of Sampled FP is that the number of samples that must be drawn in each round of the algorithm grows without bound (on the order of $sqrt{t}$, where $t$ is the round of the repeated play). In this paper we propose Computationally Efficient Sampled FP (CESFP)---a variant of Sampled FP in which only one sample need be drawn each round of the algorithm (a substantial reduction from $O(sqrt{t})$ samples per round, as required in Sampled FP). CESFP operates using a stochastic-approximation type rule to estimate the expected utility from round to round. It is proven that the CESFP algorithm achieves Nash equilibrium learning in the same sense as classical FP and Sampled FP. Simulation results suggest that the convergence rate of CESFP (in terms of repeated-play iterations) is similar to that of Sampled FP.
Stochastic differential games have been used extensively to model agents competitions in Finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed mac hine learning algorithm, deep fictitious play, provides a novel efficient tool for finding Markovian Nash equilibrium of large $N$-player asymmetric stochastic differential games [J. Han and R. Hu, Mathematical and Scientific Machine Learning Conference, pages 221-245, PMLR, 2020]. By incorporating the idea of fictitious play, the algorithm decouples the game into $N$ sub-optimization problems, and identifies each players optimal strategy with the deep backward stochastic differential equation (BSDE) method parallelly and repeatedly. In this paper, we prove the convergence of deep fictitious play (DFP) to the true Nash equilibrium. We can also show that the strategy based on DFP forms an $eps$-Nash equilibrium. We generalize the algorithm by proposing a new approach to decouple the games, and present numerical results of large population games showing the empirical convergence of the algorithm beyond the technical assumptions in the theorems.
Empirical Centroid Fictitious Play (ECFP) is a generalization of the well-known Fictitious Play (FP) algorithm designed for implementation in large-scale games. In ECFP, the set of players is subdivided into equivalence classes with players in the sa me class possessing similar properties. Players choose a next-stage action by tracking and responding to aggregate statistics related to each equivalence class. This setup alleviates the difficult task of tracking and responding to the statistical behavior of every individual player, as is the case in traditional FP. Aside from ECFP, many useful modifications have been proposed to classical FP, e.g., rules allowing for network-based implementation, increased computational efficiency, and stronger forms of learning. Such modifications tend to be of great practical value; however, their effectiveness relies heavily on two fundamental properties of FP: robustness to alterations in the empirical distribution step size process, and robustness to best-response perturbations. The main contribution of the paper is to show that similar robustness properties also hold for the ECFP algorithm. This result serves as a first step in enabling practical modifications to ECFP, similar to those already developed for FP.
Securing networked infrastructures is important in the real world. The problem of deploying security resources to protect against an attacker in networked domains can be modeled as Network Security Games (NSGs). Unfortunately, existing approaches, in cluding the deep learning-based approaches, are inefficient to solve large-scale extensive-form NSGs. In this paper, we propose a novel learning paradigm, NSG-NFSP, to solve large-scale extensive-form NSGs based on Neural Fictitious Self-Play (NFSP). Our main contributions include: i) reforming the best response (BR) policy network in NFSP to be a mapping from action-state pair to action-value, to make the calculation of BR possible in NSGs; ii) converting the average policy network of an NFSP agent into a metric-based classifier, helping the agent to assign distributions only on legal actions rather than all actions; iii) enabling NFSP with high-level actions, which can benefit training efficiency and stability in NSGs; and iv) leveraging information contained in graphs of NSGs by learning efficient graph node embeddings. Our algorithm significantly outperforms state-of-the-art algorithms in both scalability and solution quality.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا