Enforcing Almost-Sure Reachability in POMDPs

197 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nils Jansen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sebastian Junges - Nils Jansen - Sanjit A. Seshia

الذكاء الاصطناعي علم الروبوتات أنظمة وتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

قيم البحث

75 - Murat Cubuktepe , Nils Jansen , Sebastian Junges 2020

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captu res uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

الذكاء الاصطناعي التعلم الآلي أنظمة وتحكم

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

116 - Michael H. Lim , Claire J. Tomlin , Zachary N. Sunberg 2019

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-ba sed algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

التعلم الآلي علم الروبوتات أنظمة وتحكم

Almost sure convergence in quantum spin glasses

67 - David Buzinski , Elizabeth Meckes 2015

Recently, Keating, Linden, and Wells cite{KLW} showed that the density of states measure of a nearest-neighbor quantum spin glass model is approximately Gaussian when the number of particles is large. The density of states measure is the ensemble ave rage of the empirical spectral measure of a random matrix; in this paper, we use concentration of measure and entropy techniques together with the result of cite{KLW} to show that in fact, the empirical spectral measure of such a random matrix is almost surely approximately Gaussian itself, with no ensemble averaging. We also extend this result to a spherical quantum spin glass model and to the more general coupling geometries investigated by ErdH{o}s and Schroder.

الفيزياء الرياضية الفيزياء الرياضية الاحتمالات

Flow: A Modular Learning Framework for Autonomy in Traffic

100 - Cathy Wu , Aboudy Kreidieh , Kanaad Parvate 2017

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, due to numerous technical, political, and human factors challenges, new methodolo gies are needed to design vehicles and transportation systems for these positive outcomes. This article tackles technical challenges arising from the partial adoption of autonomy: partial control, partial observation, complex multi-vehicle interactions, and the sheer variety of traffic settings represented by real-world networks. The article presents a modular learning framework which leverages deep Reinforcement Learning methods to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (traffic jams, lane changing, intersections). Learned control laws are found to exceed human driving performance by at least 40% with only 5-10% adoption of AVs. In partially-observed single-lane traffic, a small neural network control law can eliminate stop-and-go traffic -- surpassing all known model-based controllers, achieving near-optimal performance, and generalizing to out-of-distribution traffic densities.

الذكاء الاصطناعي علم الروبوتات أنظمة وتحكم

Verification of indefinite-horizon POMDPs

361 - Alexander Bork , Sebastian Junges , Joost-Pieter Katoen 2020

The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it consi ders may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.

الذكاء الاصطناعي المنطق في علوم الحاسوب