Verification of indefinite-horizon POMDPs

362 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sebastian Junges

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Alexander Bork - Sebastian Junges - Joost-Pieter Katoen

الذكاء الاصطناعي المنطق في علوم الحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.

قيم البحث

367 - Leonore Winterer , Ralf Wimmer , Nils Jansen 2020

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undeci dable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.

الذكاء الاصطناعي المنطق في علوم الحاسوب علم الروبوتات

Experimental Verification of an Indefinite Causal Order

145 - Giulia Rubino , Lee A. Rozema , Adrien Feix 2016

Investigating the role of causal order in quantum mechanics has recently revealed that the causal distribution of events may not be a-priori well-defined in quantum theory. While this has triggered a growing interest on the theoretical side, creating processes without a causal order is an experimental task. Here we report the first decisive demonstration of a process with an indefinite causal order. To do this, we quantify how incompatible our set-up is with a definite causal order by measuring a causal witness. This mathematical object incorporates a series of measurements which are designed to yield a certain outcome only if the process under examination is not consistent with any well-defined causal order. In our experiment we perform a measurement in a superposition of causal orders - without destroying the coherence - to acquire information both inside and outside of a causally non-ordered process. Using this information, we experimentally determine a causal witness, demonstrating by almost seven standard deviations that the experimentally implemented process does not have a definite causal order.

فيزياء الكم

Enforcing Almost-Sure Reachability in POMDPs

196 - Sebastian Junges , Nils Jansen , Sanjit A. Seshia 2020

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

الذكاء الاصطناعي علم الروبوتات أنظمة وتحكم

Robust Finite-State Controllers for Uncertain POMDPs

75 - Murat Cubuktepe , Nils Jansen , Sebastian Junges 2020

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captu res uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

الذكاء الاصطناعي التعلم الآلي أنظمة وتحكم

Local verification of global proofs

218 - Laurent Feuilloley , Juho Hirvonen 2018

In this work we study the cost of local and global proofs on distributed verification. In this setting the nodes of a distributed system are provided with a nondeterministic proof for the correctness of the state of the system, and the nodes need to verify this proof by looking at only their local neighborhood in the system. Previous works have studied the model where each node is given its own, possibly unique, part of the proof as input. The cost of a proof is the maximum size of an individual label. We compare this model to a model where each node has access to the same global proof, and the cost is the size of this global proof. It is easy to see that a global proof can always include all of the local proofs, and every local proof can be a copy of the global proof. We show that there exists properties that exhibit these relative proof sizes, and also properties that are somewhere in between. In addition, we introduce a new lower bound technique and use it to prove a tight lower bound on the complexity of reversing distributed decision and establish a link between communication complexity and distributed proof complexity.

النظم الموزعة والتوازية والحوسبة العنقودية المنطق في علوم الحاسوب