بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Average Cost Markov Decision Processes with Semi-Uniform Feller Transition Probabilities

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Eugene Feinberg

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Eugene A. Feinberg - Pavlo O. Kasyanov - Michael Z. Zgurovsky

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper studies average-cost Markov decision processes with semi-uniform Feller transition probabilities. This class of MDPs was recently introduced by the authors to study MDPs with incomplete information. This paper studies the validity of optimality inequalities, the existence of optimal policies, and the approximations of optimal policies by policies optimizing total discounted costs.

قيم البحث

106 - Eugene A. Feinberg , Pavlo O. Kasyanov , Michael Z. Zgurovsky 2021

This paper deals with control of partially observable discrete-time stochastic systems. It introduces and studies the class of Markov Decision Processes with Incomplete information and with semi-uniform Feller transition probabilities. The important feature of this class of models is that the classic reduction of such a model with incomplete observation to the completely observable Markov Decision Process with belief states preserves semi-uniform Feller continuity of transition probabilities. Under mild assumptions on cost functions, optimal policies exist, optimality equations hold, and value iterations converge to optimal values for this class of models. In particular, for Partially Observable Markov Decision Processes the results of this paper imply new and generalize several known sufficient conditions on transition and observation probabilities for the existence of optimal policies, validity of optimality equations, and convergence of value iterations.

التحسين والتحكم

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

229 - Stefano Carpin , Yin-Lam Chow , Marco Pavone 2016

In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviation s from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool.

التحسين والتحكم

Markov Decision Processes with Long-Term Average Constraints

104 - Mridul Agarwal , Qinbo Bai , 2021

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximiz e the long-term average reward while simultaneously keeping the $K$ long-term average costs lower than a certain threshold. In this paper, we propose CMDP-PSRL, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. Further, for MDP with $S$ states, $A$ actions, and diameter $D$, we prove that following CMDP-PSRL algorithm, the agent can bound the regret of not accumulating rewards from optimal policy by $Tilde{O}(poly(DSA)sqrt{T})$. Further, we show that the violations for any of the $K$ constraints is also bounded by $Tilde{O}(poly(DSA)sqrt{T})$. To the best of our knowledge, this is the first work which obtains a $Tilde{O}(sqrt{T})$ regret bounds for ergodic MDPs with long-term average constraints.

التعلم الآلي الذكاء الاصطناعي أنظمة وتحكم

Semi-Uniform Feller Stochastic Kernels

277 - Eugene A. Feinberg , Pavlo O. Kasyanov , Michael Z. Zgurovsky 2021

This paper studies transition probabilities from a Borel subset of a Polish space to a product of two Borel subsets of Polish spaces. For such transition probabilities it introduces and studies semi-uniform Feller continuity and a weaker property cal led WTV-continuity. This paper provides several equivalent definitions of semi-uniform Feller continuity and describes the preservation property of WTV-continuity under integration. The motivation for this study came from the theory of Markov decision processes with incomplete information, and this paper provides fundamental results useful for this theory.

الاحتمالات

Constrained discounted Markov decision processes with Borel state spaces

81 - Eugene A. Feinberg , Anna Jaskiewicz , Andrzej S. Nowak 2018

We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper se micontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of stationary randomized policies. Then, we indicate that if the initial distribution and transition probabilities are non-atomic, then using a general purification result of Feinberg and Piunovskiy, stationary optimal policies can be deterministic. Our main results are illustrated by five examples.

التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد العالي للعلوم التطبيقية والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Average Cost Markov Decision Processes with Semi-Uniform Feller Transition Probabilities

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً