Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

230 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yinlam Chow

تاريخ النشر 2016

مجال البحث

والبحث باللغة English

تأليف Stefano Carpin - Yin-Lam Chow - Marco Pavone

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool.

قيم البحث

73 - Nicole Bauerle , Alexander Glauner 2020

We study the minimization of a spectral risk measure of the total discounted cost generated by a Markov Decision Process (MDP) over a finite or infinite planning horizon. The MDP is assumed to have Borel state and action spaces and the cost function may be unbounded above. The optimization problem is split into two minimization problems using an infimum representation for spectral risk measures. We show that the inner minimization problem can be solved as an ordinary MDP on an extended state space and give sufficient conditions under which an optimal policy exists. Regarding the infinite dimensional outer minimization problem, we prove the existence of a solution and derive an algorithm for its numerical approximation. Our results include the findings in Bauerle and Ott (2011) in the special case that the risk measure is Expected Shortfall. As an application, we present a dynamic extension of the classical static optimal reinsurance problem, where an insurance company minimizes its cost of capital.

التحسين والتحكم الإحصاء وإدارة المخاطر

Average Cost Markov Decision Processes with Semi-Uniform Feller Transition Probabilities

113 - Eugene A. Feinberg , Pavlo O. Kasyanov , Michael Z. Zgurovsky 2021

This paper studies average-cost Markov decision processes with semi-uniform Feller transition probabilities. This class of MDPs was recently introduced by the authors to study MDPs with incomplete information. This paper studies the validity of optim ality inequalities, the existence of optimal policies, and the approximations of optimal policies by policies optimizing total discounted costs.

التحسين والتحكم

Risk-sensitive Markov decision problems under model uncertainty: finite time horizon case

108 - Tomasz R. Bielecki , Tao Chen , Igor Cialenco 2021

In this paper we study a class of risk-sensitive Markovian control problems in discrete time subject to model uncertainty. We consider a risk-sensitive discounted cost criterion with finite time horizon. The used methodology is the one of adaptive robust control combined with machine learning.

التحسين والتحكم

Multivariate Utility Optimization with an Application to Risk-Sensitive Partially Observable Markov Decision Processes

155 - Vaios Laschos , Robert Seidel , Klaus Obermayer 2018

We introduce and treat a class of Multi Objective Risk-Sensitive Markov Decision Processes (MORSMDPs), where the optimality criteria are generated by a multivariate utility function applied on a finite set of emph{different running costs}. To illustr ate our approach, we study the example of a two-armed bandit problem. In the sequel, we show that it is possible to reformulate standard Risk-Sensitive Partially Observable Markov Decision Processes (RSPOMDPs), where risk is modeled by a utility function that is a emph{sum of exponentials}, as MORSMDPs that can be solved with the methods described in the first part. This way, we extend the treatment of RSPOMDPs with exponential utility to RSPOMDPs corresponding to a qualitatively bigger family of utility functions.

التحسين والتحكم

On Exponential Utility and Conditional Value-at-Risk as Risk-Averse Performance Criteria

118 - Kevin M. Smith , Margaret P. Chapman 2021

The standard approach to risk-averse control is to use the Exponential Utility (EU) functional, which has been studied for several decades. Like other risk-averse utility functionals, EU encodes risk aversion through an increasing convex mapping $var phi$ of objective costs to subjective costs. An objective cost is a realization $y$ of a random variable $Y$. In contrast, a subjective cost is a realization $varphi(y)$ of a random variable $varphi(Y)$ that has been transformed to measure preferences about the outcomes. For EU, the transformation is $varphi(y) = exp(frac{-theta}{2}y)$, and under certain conditions, the quantity $varphi^{-1}(E(varphi(Y)))$ can be approximated by a linear combination of the mean and variance of $Y$. More recently, there has been growing interest in risk-averse control using the Conditional Value-at-Risk (CVaR) functional. In contrast to the EU functional, the CVaR of a random variable $Y$ concerns a fraction of its possible realizations. If $Y$ is a continuous random variable with finite $E(|Y|)$, then the CVaR of $Y$ at level $alpha$ is the expectation of $Y$ in the $alpha cdot 100 %$ worst cases. Here, we study the applications of risk-averse functionals to controller synthesis and safety analysis through the development of numerical examples, with emphasis on EU and CVaR. Our contribution is to examine the decision-theoretic, mathematical, and computational trade-offs that arise when using EU and CVaR for optimal control and safety analysis. We are hopeful that this work will advance the interpretability and elucidate the potential benefits of risk-averse control technology.

أنظمة وتحكم أنظمة وتحكم التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة السورية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً