ترغب بنشر مسار تعليمي؟ اضغط هنا

Qualitative Planning in Imperfect Information Games with Active Sensing and Reactive Sensor Attacks: Cost of Unawareness

71   0   0.0 ( 0 )
 نشر من قبل Abhishek Kulkarni
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the probabilistic planning problem where the agent (called Player 1, or P1) can jointly plan the control actions and sensor queries in a sensor network and an attacker (called player 2, or P2) can carry out attacks on the sensors. We model such an adversarial interaction using a formal model -- a reachability game with partially controllable observation functions. The main contribution of this paper is to assess the cost of P1s unawareness: Suppose P1 misinterprets the sensor failures as probabilistic node failures due to unreliable network communication, and P2 is aware of P1s misinterpretation in addition to her partial observability. Then, from which states can P2 carry out sensor attacks to ensure, with probability one, that P1 will not be able to complete her reachability task even though, due to misinterpretation, P1 believes that she can almost-surely achieve her task. We develop an algorithm to solve the almost-sure winning sensor-attack strategy given P1s observation-based strategy. Our attack analysis could be used for attack detection in wireless communication networks and the design of provably secured attack-aware sensor allocation in decision-theoretic models for cyber-physical systems.



قيم البحث

اقرأ أيضاً

A common goal in the areas of secure information flow and privacy is to build effective defenses against unwanted leakage of information. To this end, one must be able to reason about potential attacks and their interplay with possible defenses. In t his paper, we propose a game-theoretic framework to formalize strategies of attacker and defender in the context of information leakage, and provide a basis for developing optimal defense methods. A novelty of our games is that their utility is given by information leakage, which in some cases may behave in a non-linear way. This causes a significant deviation from classic game theory, in which utility functions are linear with respect to players strategies. Hence, a key contribution of this paper is the establishment of the foundations of information leakage games. We consider two kinds of games, depending on the notion of leakage considered. The first kind, the QIF-games, is tailored for the theory of quantitative information flow (QIF). The second one, the DP-games, corresponds to differential privacy (DP).
We study tree games developed recently by Matteo Mio as a game interpretation of the probabilistic $mu$-calculus. With expressive power comes complexity. Mio showed that tree games are able to encode Blackwell games and, consequently, are not determi ned under deterministic strategies. We show that non-stochastic tree games with objectives recognisable by so-called game automata are determined under deterministic, finite memory strategies. Moreover, we give an elementary algorithmic procedure which, for an arbitrary regular language L and a finite non-stochastic tree game with a winning objective L decides if the game is determined under deterministic strategies.
Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two-player zero-sum games. We argue that the~fixed-strategy~definitions of exploitability and $epsilon$-Nash equilibria are ill-suited to measure an online algorithms worst-case performance. We thus formalize $epsilon$-soundness, a concept that connects the worst-case performance of an online algorithm to the performance of an $epsilon$-Nash equilibrium. As $epsilon$-soundness can be difficult to compute in general, we introduce a consistency framework -- a hierarchy that connects an online algorithms behavior to a Nash equilibrium. These multiple levels of consistency describe in what sense an online algorithm plays just like a fixed Nash equilibrium. These notions further illustrate the difference between perfect and imperfect information settings, as the same consistency guarantees have different worst-case online performance in perfect and imperfect information games. The definitions of soundness and the consistency hierarchy finally provide appropriate tools to analyze online algorithms in repeated imperfect information games. We thus inspect some of the previous online algorithms in a new light, bringing new insights into their worst-case performance guarantees.
This volume contains the proceedings of the 11th International Symposium on Games, Automata, Logic and Formal Verification (GandALF 2020). The symposium took place as a fully online event on September 21-22, 2020. The GandALF symposium was establishe d by a group of Italian computer scientists interested in mathematical logic, automata theory, game theory, and their applications to the specification, design, and verification of complex systems. Its aim is to provide a forum where people from different areas, and possibly with different backgrounds, can fruitfully interact. GandALF has a truly international spirit, as witnessed by the composition of the program and steering committee and by the country distribution of the submitted papers.
To learn good joint policies for multi-agent collaboration with imperfect information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches (optimizing one agents policy at a time, e.g., self-play) work wi th guarantees, in multi-agent cooperative setting they often converge to sub-optimal Nash equilibrium. On the other hand, directly modeling joint policy changes in imperfect information game is nontrivial due to complicated interplay of policies (e.g., upstream updates affect downstream state reachability). In this paper, we show global changes of game values can be decomposed to policy changes localized at each information set, with a novel term named policy-change density. Based on this, we propose Joint Policy Search(JPS) that iteratively improves joint policies of collaborative agents in imperfect information games, without re-evaluating the entire game. On multi-agent collaborative tabular games, JPS is proven to never worsen performance and can improve solutions provided by unilateral approaches (e.g, CFR), outperforming algorithms designed for collaborative policy learning (e.g. BAD). Furthermore, for real-world games, JPS has an online form that naturally links with gradient updates. We test it to Contract Bridge, a 4-player imperfect-information game where a team of $2$ collaborates to compete against the other. In its bidding phase, players bid in turn to find a good contract through a limited information channel. Based on a strong baseline agent that bids competitive bridge purely through domain-agnostic self-play, JPS improves collaboration of team players and outperforms WBridge5, a championship-winning software, by $+0.63$ IMPs (International Matching Points) per board over 1k games, substantially better than previous SoTA ($+0.41$ IMPs/b) under Double-Dummy evaluation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا