No Arabic abstract
In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool.
We study the minimization of a spectral risk measure of the total discounted cost generated by a Markov Decision Process (MDP) over a finite or infinite planning horizon. The MDP is assumed to have Borel state and action spaces and the cost function may be unbounded above. The optimization problem is split into two minimization problems using an infimum representation for spectral risk measures. We show that the inner minimization problem can be solved as an ordinary MDP on an extended state space and give sufficient conditions under which an optimal policy exists. Regarding the infinite dimensional outer minimization problem, we prove the existence of a solution and derive an algorithm for its numerical approximation. Our results include the findings in Bauerle and Ott (2011) in the special case that the risk measure is Expected Shortfall. As an application, we present a dynamic extension of the classical static optimal reinsurance problem, where an insurance company minimizes its cost of capital.
This paper studies average-cost Markov decision processes with semi-uniform Feller transition probabilities. This class of MDPs was recently introduced by the authors to study MDPs with incomplete information. This paper studies the validity of optimality inequalities, the existence of optimal policies, and the approximations of optimal policies by policies optimizing total discounted costs.
In this paper we study a class of risk-sensitive Markovian control problems in discrete time subject to model uncertainty. We consider a risk-sensitive discounted cost criterion with finite time horizon. The used methodology is the one of adaptive robust control combined with machine learning.
We introduce and treat a class of Multi Objective Risk-Sensitive Markov Decision Processes (MORSMDPs), where the optimality criteria are generated by a multivariate utility function applied on a finite set of emph{different running costs}. To illustrate our approach, we study the example of a two-armed bandit problem. In the sequel, we show that it is possible to reformulate standard Risk-Sensitive Partially Observable Markov Decision Processes (RSPOMDPs), where risk is modeled by a utility function that is a emph{sum of exponentials}, as MORSMDPs that can be solved with the methods described in the first part. This way, we extend the treatment of RSPOMDPs with exponential utility to RSPOMDPs corresponding to a qualitatively bigger family of utility functions.
The standard approach to risk-averse control is to use the Exponential Utility (EU) functional, which has been studied for several decades. Like other risk-averse utility functionals, EU encodes risk aversion through an increasing convex mapping $varphi$ of objective costs to subjective costs. An objective cost is a realization $y$ of a random variable $Y$. In contrast, a subjective cost is a realization $varphi(y)$ of a random variable $varphi(Y)$ that has been transformed to measure preferences about the outcomes. For EU, the transformation is $varphi(y) = exp(frac{-theta}{2}y)$, and under certain conditions, the quantity $varphi^{-1}(E(varphi(Y)))$ can be approximated by a linear combination of the mean and variance of $Y$. More recently, there has been growing interest in risk-averse control using the Conditional Value-at-Risk (CVaR) functional. In contrast to the EU functional, the CVaR of a random variable $Y$ concerns a fraction of its possible realizations. If $Y$ is a continuous random variable with finite $E(|Y|)$, then the CVaR of $Y$ at level $alpha$ is the expectation of $Y$ in the $alpha cdot 100 %$ worst cases. Here, we study the applications of risk-averse functionals to controller synthesis and safety analysis through the development of numerical examples, with emphasis on EU and CVaR. Our contribution is to examine the decision-theoretic, mathematical, and computational trade-offs that arise when using EU and CVaR for optimal control and safety analysis. We are hopeful that this work will advance the interpretability and elucidate the potential benefits of risk-averse control technology.