ﻻ يوجد ملخص باللغة العربية
Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two sound variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iterations ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to guess an upper bound, and prove the latters correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an extensive experimental evaluation using our publicly available implementation within the Modest Toolset.
Computing reachability probabilities is at the heart of probabilistic model checking. All model checkers compute these probabilities in an iterative fashion using value iteration. This technique approximates a fixed point from below by determining re
We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies fo
Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this
Iterative algorithms are traditionally expressed in ACL2 using recursion. On the other hand, Common Lisp provides a construct, loop, which -- like most programming languages -- provides direct support for iteration. We describe an ACL2 analogue loop$
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding state-distribut