ترغب بنشر مسار تعليمي؟ اضغط هنا

Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty

79   0   0.0 ( 0 )
 نشر من قبل Harshavardhan Kamarthi
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers. The model further incorporates uncertainty for real-time decision-making within a team of drones and human patrollers. Second, we present CombSGPO, a novel and scalable algorithm based on reinforcement learning, to compute a defender strategy for this game model. CombSGPO performs policy search over a multi-dimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q-Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. Third, we provide a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, showing group formation between defender resources and patrolling formations based on signaling and notifications between resources. Importantly, we find that strategic signaling emerges in the final learnt strategy. Finally, we perform experiments to evaluate these strategies under different levels of uncertainty.

قيم البحث

اقرأ أيضاً

Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games. In these games, algorithms such as fictitious self-play and minimax tree search can converge to an approximate Nash equilibrium. While playing a Nash e quilibrium strategy in a two-player zero-sum game is optimal, in an $n$-player general sum game, it becomes a much less informative solution concept. Despite the lack of a satisfying solution concept, $n$-player games form the vast majority of real-world multiagent situations. In this paper we present a new framework for research in reinforcement learning in $n$-player games. We hope that by analyzing behavior learned by agents in these environments the community can better understand this important research area and move toward meaningful solution concepts and research directions. The implementation and additional information about this framework can be found at https://colosseumrl.igb.uci.edu/.
The continuous patrolling game studied here was first proposed in Alpern et al. (2011), which studied a discrete time game where facilities to be protected were modeled as the nodes of a graph. Here we consider protecting roads or pipelines, modeled as the arcs of a continuous network $Q$. The Attacker chooses a point of $Q$ to attack during a chosen time interval of fixed duration (the attack time, $alpha$). The Patroller chooses a unit speed path on $Q$ and intercepts the attack (and wins) if she visits the attacked point during the attack time interval. Solutions to the game have previously been given in certain special cases. Here, we analyze the game on arbitrary networks. Our results include the following: (i) a solution to the game for any network $Q$, as long as $alpha$ is sufficiently short, generalizing the known solutions for circle or Eulerian networks and the network with two nodes joined by three arcs; (ii) a solution to the game for all tree networks that satisfy a condition on their extremities. We present a conjecture on the solution of the game for arbitrary trees and establish it in certain cases.
128 - David Mguni , Yutong Wu , Yali Du 2021
Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agen t systems. In this paper, we introduce a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings. In particular, we study the MARL problem in a class of games known as stochastic potential games (SPGs) with continuous state-action spaces. Unlike cooperative games, in which all agents share a common reward, SPGs are capable of modelling real-world scenarios where agents seek to fulfil their individual goals. We prove theoretically our learning method, SPot-AC, enables independent agents to learn Nash equilibrium strategies in polynomial time. We demonstrate our framework tackles previously unsolvable tasks such as Coordination Navigation and large selfish routing games and that it outperforms the state of the art MARL baselines such as MADDPG and COMIX in such scenarios.
We present a new type of coordination mechanism among multiple agents for the allocation of a finite resource, such as the allocation of time slots for passing an intersection. We consider the setting where we associate one counter to each agent, whi ch we call karma value, and where there is an established mechanism to decide resource allocation based on agents exchanging karma. The idea is that agents might be inclined to pass on using resources today, in exchange for karma, which will make it easier for them to claim the resource use in the future. To understand whether such a system might work robustly, we only design the protocol and not the agents policies. We take a game-theoretic perspective and compute policies corresponding to Nash equilibria for the game. We find, surprisingly, that the Nash equilibria for a society of self-interested agents are very close in social welfare to a centralized cooperative solution. These results suggest that many resource allocation problems can have a simple, elegant, and robust solution, assuming the availability of a karma accounting mechanism.
Security Games employ game theoretical tools to derive resource allocation strategies in security domains. Recent works considered the presence of alarm systems, even suffering various forms of uncertainty, and showed that disregarding alarm signals may lead to arbitrarily bad strategies. The central problem with an alarm system, unexplored in other Security Games, is finding the best strategy to respond to alarm signals for each mobile defensive resource. The literature provides results for the basic single-resource case, showing that even in that case the problem is computationally hard. In this paper, we focus on the challenging problem of designing algorithms scaling with multiple resources. First, we focus on finding the minimum number of resources assuring non-null protection to every target. Then, we deal with the computation of multi-resource strategies with different degrees of coordination among resources. For each considered problem, we provide a computational analysis and propose algorithmic methods.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا