Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Enforcing Almost-Sure Reachability in POMDPs

197 0 0.0 ( 0 )

Download Cite

Added by Nils Jansen

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Sebastian Junges - Nils Jansen - Sanjit A. Seshia

Artificial Intelligence Robotics Systems and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

rate research

Robust Finite-State Controllers for Uncertain POMDPs

75 - Murat Cubuktepe , Nils Jansen , Sebastian Junges 2020

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty, referred to as epistemic uncertainty, captures uncountable sets of probability distributions caused by, for instance, a lack of data available. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy specifications against any admissible distribution. In general, computing such policies is theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

Artificial Intelligence Machine Learning Systems and Control

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

116 - Michael H. Lim , Claire J. Tomlin , Zachary N. Sunberg 2019

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

Machine Learning Robotics Systems and Control

Almost sure convergence in quantum spin glasses

67 - David Buzinski , Elizabeth Meckes 2015

Recently, Keating, Linden, and Wells cite{KLW} showed that the density of states measure of a nearest-neighbor quantum spin glass model is approximately Gaussian when the number of particles is large. The density of states measure is the ensemble average of the empirical spectral measure of a random matrix; in this paper, we use concentration of measure and entropy techniques together with the result of cite{KLW} to show that in fact, the empirical spectral measure of such a random matrix is almost surely approximately Gaussian itself, with no ensemble averaging. We also extend this result to a spherical quantum spin glass model and to the more general coupling geometries investigated by ErdH{o}s and Schroder.

Mathematical Physics Mathematical Physics Probability

Flow: A Modular Learning Framework for Autonomy in Traffic

100 - Cathy Wu , Aboudy Kreidieh , Kanaad Parvate 2017

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, due to numerous technical, political, and human factors challenges, new methodologies are needed to design vehicles and transportation systems for these positive outcomes. This article tackles technical challenges arising from the partial adoption of autonomy: partial control, partial observation, complex multi-vehicle interactions, and the sheer variety of traffic settings represented by real-world networks. The article presents a modular learning framework which leverages deep Reinforcement Learning methods to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (traffic jams, lane changing, intersections). Learned control laws are found to exceed human driving performance by at least 40% with only 5-10% adoption of AVs. In partially-observed single-lane traffic, a small neural network control law can eliminate stop-and-go traffic -- surpassing all known model-based controllers, achieving near-optimal performance, and generalizing to out-of-distribution traffic densities.

Artificial Intelligence Robotics Systems and Control

Verification of indefinite-horizon POMDPs

361 - Alexander Bork , Sebastian Junges , Joost-Pieter Katoen 2020

The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.

Artificial Intelligence Logic in Computer Science

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Enforcing Almost-Sure Reachability in POMDPs

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions