ترغب بنشر مسار تعليمي؟ اضغط هنا

Effects of Dynamic-Win-Stay-Lose-Learn model with voluntary participation in social dilemma

111   0   0.0 ( 0 )
 نشر من قبل Zhenyu Shi
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In recent years, Win-Stay-Lose-Learn rule has attracted wide attention as an effective strategy updating rule, and voluntary participation is proposed by introducing a third strategy in Prisoners dilemma game. Some researches show that combining Win-Stay-Lose-Learn rule with voluntary participation could promote cooperation more significantly under moderate temptation values, however, cooperators survival under high aspiration levels and high temptation values is still a challenging problem. In this paper, inspired by Achievement Motivation Theory, a Dynamic-Win-Stay-Lose-Learn rule with voluntary participation is investigated, where a dynamic aspiration process is introduced to describe the co-evolution of individuals strategies and aspirations. It is found that cooperation is extremely promoted and defection is almost extinct in our model, even when the initial aspiration levels and temptation values are high. The combination of dynamic aspiration and voluntary participation plays an active role since loners could survive under high initial aspiration levels and they will expand stably because of their fixed payoffs. The robustness of our model is also discussed and some adverse structures are found which should be alerted in the evolutionary process. Our work provides a more realistic model and shows that cooperators may prevail defectors in an unfavorable initial environment.



قيم البحث

اقرأ أيضاً

Prisoners dilemma game is the most commonly used model of spatial evolutionary game which is considered as a paradigm to portray competition among selfish individuals. In recent years, Win-Stay-Lose-Learn, a strategy updating rule base on aspiration, has been proved to be an effective model to promote cooperation in spatial prisoners dilemma game, which leads aspiration to receive lots of attention. But in many research the assumption that individuals aspiration is fixed is inconsistent with recent results from psychology. In this paper, according to Expected Value Theory and Achievement Motivation Theory, we propose a dynamic aspiration model based on Win-Stay-Lose-Learn rule in which individuals aspiration is inspired by its payoff. It is found that dynamic aspiration has a significant impact on the evolution process, and different initial aspirations lead to different results, which are called Stable Coexistence under Low Aspiration, Dependent Coexistence under Moderate aspiration and Defection Explosion under High Aspiration respectively. Furthermore, a deep analysis is performed on the local structures which cause cooperators existence or defectors expansion, and the evolution process for different parameters including strategy and aspiration. As a result, the intrinsic structures leading to defectors expansion and cooperators survival are achieved for different evolution process, which provides a penetrating understanding of the evolution. Compared to fixed aspiration model, dynamic aspiration introduces a more satisfactory explanation on population evolution laws and can promote deeper comprehension for the principle of prisoners dilemma.
463 - Minjae Kim , Jung-Kyoo Choi , 2021
Evolutionary game theory assumes that players replicate a highly scored players strategy through genetic inheritance. However, when learning occurs culturally, it is often difficult to recognize someones strategy just by observing the behaviour. In t his work, we consider players with memory-one stochastic strategies in the iterated prisoners dilemma, with an assumption that they cannot directly access each others strategy but only observe the actual moves for a certain number of rounds. Based on the observation, the observer has to infer the resident strategy in a Bayesian way and chooses his or her own strategy accordingly. By examining the best-response relations, we argue that players can escape from full defection into a cooperative equilibrium supported by Win-Stay-Lose-Shift in a self-confirming manner, provided that the cost of cooperation is low and the observational learning supplies sufficiently large uncertainty.
Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or their local model? We show how this situation can be naturally analyzed through the framework of coalitional game theory. We propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each players goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. Here, we derive exact expected MSE values for problems in linear regression and mean estimation. We then analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly construct model(s). We analyze three methods of federation, modeling differing degrees of customization. In uniform federation, the agents collectively produce a single model. In coarse-grained federation, each agent can weight the global model together with their local model. In fine-grained federation, each agent can flexibly combine models from all other agents in the federation. For each method, we analyze the stable partitions of players into coalitions.
In large scale collective decision making, social choice is a normative study of how one ought to design a protocol for reaching consensus. However, in instances where the underlying decision space is too large or complex for ordinal voting, standard voting methods of social choice may be impractical. How then can we design a mechanism - preferably decentralized, simple, scalable, and not requiring any special knowledge of the decision space - to reach consensus? We propose sequential deliberation as a natural solution to this problem. In this iterative method, successive pairs of agents bargain over the decision space using the previous decision as a disagreement alternative. We describe the general method and analyze the quality of its outcome when the space of preferences define a median graph. We show that sequential deliberation finds a 1.208- approximation to the optimal social cost on such graphs, coming very close to this value with only a small constant number of agents sampled from the population. We also show lower bounds on simpler classes of mechanisms to justify our design choices. We further show that sequential deliberation is ex-post Pareto efficient and has truthful reporting as an equilibrium of the induced extensive form game. We finally show that for general metric spaces, the second moment of of the distribution of social cost of the outcomes produced by sequential deliberation is also bounded.
Solving a reinforcement learning problem typically involves correctly prespecifying the reward signal from which the algorithm learns. Here, we approach the problem of reward signal design by using an evolutionary approach to perform a search on the space of all possible reward signals. We introduce a general framework for optimizing $N$ goals given $n$ reward signals. Through experiments we demonstrate that such an approach allows agents to learn high-level goals - such as winning, losing and cooperating - from scratch without prespecified reward signals in the game of Pong. Some of the solutions found by the algorithm are surprising, in the sense that they would probably not have been chosen by a person trying to hand-code a given behaviour through a specific reward signal. Furthermore, it seems that the proposed approach may also benefit from higher stability of the training performance when compared with the typical score-based reward signals.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا