ترغب بنشر مسار تعليمي؟ اضغط هنا

Parameter-Independent Strategies for pMDPs via POMDPs

103   0   0.0 ( 0 )
 نشر من قبل Sebastian Arming
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances. We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs. We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.

قيم البحث

اقرأ أيضاً

We study finite-state controllers (FSCs) for partially observable Markov decision processes (POMDPs) that are provably correct with respect to given specifications. The key insight is that computing (randomised) FSCs on POMDPs is equivalent to - and computationally as hard as - synthesis for parametric Markov chains (pMCs). This correspondence allows to use tools for parameter synthesis in pMCs to compute correct-by-construction FSCs on POMDPs for a variety of specifications. Our experimental evaluation shows comparable performance to well-known POMDP solvers.
369 - Xiaobin Zhang , Bo Wu , 2017
Partially Observable Markov Decision Process (POMDP) is widely used to model probabilistic behavior for complex systems. Compared with MDPs, POMDP models a system more accurate but solving a POMDP generally takes exponential time in the size of its s tate space. This makes the formal verification and synthesis problems much more challenging for POMDPs, especially when multiple system components are involved. As a promising technique to reduce the verification complexity, the abstraction method tries to find an abstract system with a smaller state space but preserves enough properties for the verification purpose. While abstraction based verification has been explored extensively for MDPs, in this paper, we present the first result of POMDP abstraction and its refinement techniques. The main idea follows the counterexample-guided abstraction refinement (CEGAR) framework. Starting with a coarse guess for the POMDP abstraction, we iteratively use counterexamples from formal verification to refine the abstraction until the abstract system can be used to infer the verification result for the original POMDP. Our main contributions have two folds: 1) we propose a novel abstract system model for POMDP and a new simulation relation to capture the partial observability then prove the preservation on a fragment of Probabilistic Computation Tree Logic (PCTL); 2) to find a proper abstract system that can prove or disprove the satisfaction relation on the concrete POMDP, we develop a novel refinement algorithm. Our work leads to a sound and complete CEGAR framework for POMDP.
In asynchronous games, Melli{`e}s proved that innocent strategies are positional: their behaviour only depends on the position, not the temporal order used to reach it. This insightful result shaped our understanding of the link between dynamic (i.e. game) and static (i.e. relational) semantics. In this paper, we investigate the positionality of innocent strategies in the traditional setting of Hyland-Ong-Nickau-Coquand pointer games. We show that though innocent strategies are not positional, total finite innocent strategies still enjoy a key consequence of positionality, namely positional injectivity: they are entirely determined by their positions. Unfortunately, this does not hold in general: we show a counterexample if finiteness and totality are lifted. For finite partial strategies we leave the problem open; we show however the partial result that two strategies with the same positions must have the same P-views of maximal length.
156 - Hein Duijf 2016
Quite some work in the ATL-tradition uses the differences between various types of strategies (positional, uniform, perfect recall) to give alternative semantics to the same logical language. This paper contributes to another perspective on strategy types, one where we characterise the differences between them on the syntactic (object language) level. This is important for a more traditional knowledge representation view on strategic content. Leaving differences between strategy types implicit in the semantics is a sensible idea if the goal is to use the strategic formalism for model checking. But, for traditional knowledge representation in terms of object language level formulas, we need to extent the language. This paper introduces a strategic STIT syntax with explicit operators for knowledge that allows us to charaterise strategy types. This more expressive strategic language is interpreted on standard ATL-type concurrent epistemic game structures. We introduce rule-based strategies in our language and fruitfully apply them to the representation and characterisation of positional and uniform strategies. Our representations highlight crucial conditions to be met for strategy types. We demonstrate the usefulness of our work by showing that it leads to a critical reexamination of coalitional uniform strategies.
We study the problem of policy synthesis for uncertain partially observable Markov decision processes (uPOMDPs). The transition probability function of uPOMDPs is only known to belong to a so-called uncertainty set, for instance in the form of probab ility intervals. Such a model arises when, for example, an agent operates under information limitation due to imperfect knowledge about the accuracy of its sensors. The goal is to compute a policy for the agent that is robust against all possible probability distributions within the uncertainty set. In particular, we are interested in a policy that robustly ensures the satisfaction of temporal logic and expected reward specifications. We state the underlying optimization problem as a semi-infinite quadratically-constrained quadratic program (QCQP), which has finitely many variables and infinitely many constraints. Since QCQPs are non-convex in general and practically infeasible to solve, we resort to the so-called convex-concave procedure to convexify the QCQP. Even though convex, the resulting optimization problem still has infinitely many constraints and is NP-hard. For uncertainty sets that form convex polytopes, we provide a transformation of the problem to a convex QCQP with finitely many constraints. We demonstrate the feasibility of our approach by means of several case studies that highlight typical bottlenecks for our problem. In particular, we show that we are able to solve benchmarks with hundreds of thousands of states, hundreds of different observations, and we investigate the effect of different levels of uncertainty in the models.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا