ترغب بنشر مسار تعليمي؟ اضغط هنا

Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents decisions. Due to the complexity of the problem, the majority of the previously developed MARL algorithms assumed agents either had some knowledge of th e underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called the Weighted Policy Learner (WPL), which allows agents to reach a Nash Equilibrium (NE) in benchmark 2-player-2-action games with minimum knowledge. Using WPL, the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore, WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark two-player-two-action games. We also show that our algorithm converges in the challenging Shapleys game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore, we show that WPL outperforms the state-of-the-art algorithms in a more realistic setting of 100 agents interacting and learning concurrently. An important aspect of understanding the behavior of a MARL algorithm is analyzing the dynamics of the algorithm: how the policies of multiple learning agents evolve over time as agents interact with one another. Such an analysis not only verifies whether agents using a given MARL algorithm will eventually converge, but also reveals the behavior of the MARL algorithm prior to convergence. We analyze our algorithm in two-player-two-action games and show that symbolically proving WPLs convergence is difficult, because of the non-linear nature of WPLs dynamics, unlike previous MARL algorithms that had either linear or piece-wise-linear dynamics. Instead, we numerically solve WPLs dynamics differential equations and compare the solution to the dynamics of previous MARL algorithms.
Peer punishment of free-riders (defectors) is a key mechanism for promoting cooperation in society. However, it is highly unstable since some cooperators may contribute to a common project but refuse to punish defectors. Centralized sanctioning insti tutions (for example, tax-funded police and criminal courts) can solve this problem by punishing both defectors and cooperators who refuse to punish. These institutions have been shown to emerge naturally through social learning and then displace all other forms of punishment, including peer punishment. However, this result provokes a number of questions. If centralized sanctioning is so successful, then why do many highly authoritarian states suffer from low levels of cooperation? Why do states with high levels of public good provision tend to rely more on citizen-driven peer punishment? And what happens if centralized institutions can be circumvented by individual acts of bribery? Here, we consider how corruption influences the evolution of cooperation and punishment. Our model shows that the effectiveness of centralized punishment in promoting cooperation breaks down when some actors in the model are allowed to bribe centralized authorities. Counterintuitively, increasing the sanctioning power of the central institution makes things even worse, since this prevents peer punishers from playing a role in maintaining cooperation. As a result, a weaker centralized authority is actually more effective because it allows peer punishment to restore cooperation in the presence of corruption. Our results provide an evolutionary rationale for why public goods provision rarely flourishes in polities that rely only on strong centralized institutions. Instead, cooperation requires both decentralized and centralized enforcement. These results help to explain why citizen participation is a fundamental necessity for policing the commons.
58 - Sherief Abdallah 2009
Several important complex network measures that helped discovering common patterns across real-world networks ignore edge weights, an important information in real-world networks. We propose a new methodology for generalizing measures of unweighted n etworks through a generalization of the cardinality concept of a set of weights. The key observation here is that many measures of unweighted networks use the cardinality (the size) of some subset of edges in their computation. For example, the node degree is the number of edges incident to a node. We define the effective cardinality, a new metric that quantifies how many edges are effectively being used, assuming that an edges weight reflects the amount of interaction across that edge. We prove that a generalized measure, using our method, reduces to the original unweighted measure if there is no disparity between weights, which ensures that the laws that govern the original unweighted measure will also govern the generalized measure when the weights are equal. We also prove that our generalization ensures a partial ordering (among sets of weighted edges) that is consistent with the original unweighted measure, unlike previously developed generalizations. We illustrate the applicability of our method by generalizing four unweighted network measures. As a case study, we analyze four real-world weighted networks using our generalized degree and clustering coefficient. The analysis shows that the generalized degree distribution is consistent with the power-law hypothesis but with steeper decline and that there is a common pattern governing the ratio between the generalized degree and the traditional degree. The analysis also shows that nodes with more uniform weights tend to cluster with nodes that also have more uniform weights among themselves.
37 - Sherief Abdallah 2009
Experimental verification has been the method of choice for verifying the stability of a multi-agent reinforcement learning (MARL) algorithm as the number of agents grows and theoretical analysis becomes prohibitively complex. For cooperative agents, where the ultimate goal is to optimize some global metric, the stability is usually verified by observing the evolution of the global performance metric over time. If the global metric improves and eventually stabilizes, it is considered a reasonable verification of the systems stability. The main contribution of this note is establishing the need for better experimental frameworks and measures to assess the stability of large-scale adaptive cooperative systems. We show an experimental case study where the stability of the global performance metric can be rather deceiving, hiding an underlying instability in the system that later leads to a significant drop in performance. We then propose an alternative metric that relies on agents local policies and show, experimentally, that our proposed metric is more effective (than the traditional global performance metric) in exposing the instability of MARL algorithms.
79 - Sherief Abdallah 2009
A key measure that has been used extensively in analyzing complex networks is the degree of a node (the number of the nodes neighbors). Because of its discrete nature, when the degree measure was used in analyzing weighted networks, weights were eith er ignored or thresholded in order to retain or disregard an edge. Therefore, despite its popularity, the degree measure fails to capture the disparity of interaction between a node and its neighbors. We introduce in this paper a generalization of the degree measure that addresses this limitation: the continuous node degree (C-degree). The C-degree of a node reflects how many neighbors are effectively being used, taking interaction disparity into account. More importantly, if a node interacts uniformly with its neighbors (no interaction disparity), we prove that the C-degree of the node becomes identical to the nodes (discrete) degree. We analyze four real-world weighted networks using the new measure and show that the C-degree distribution follows the power-law, similar to the traditional degree distribution, but with steeper decline. We also show that the ratio between the C-degree and the (discrete) degree follows a pattern that is common in the four studied networks.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا