Optimistic Value Iteration


الملخص بالإنكليزية

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two sound variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iterations ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to guess an upper bound, and prove the latters correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an extensive experimental evaluation using our publicly available implementation within the Modest Toolset.

تحميل البحث