مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

On the policy improvement algorithm for ergodic risk-sensitive control

83 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ari Arapostathis

تاريخ النشر 2019

مجال البحث

والبحث باللغة English

تأليف Ari Arapostathis - Anup Biswas -

التحسين والتحكم الاحتمالات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this article we consider the ergodic risk-sensitive control problem for a large class of multidimensional controlled diffusions on the whole space. We study the minimization and maximization problems under either a blanket stability hypothesis, or a near-monotone assumption on the running cost. We establish the convergence of the policy improvement algorithm for these models. We also present a more general result concerning the region of attraction of the equilibrium of the algorithm.

قيم البحث

105 - Anup Biswas , Somnath Pradhan 2021

We consider a large family of discrete and continuous time controlled Markov processes and study an ergodic risk-sensitive minimization problem. Under a blanket stability assumption, we provide a complete analysis to this problem. In particular, we e stablish uniqueness of the value function and verification result for optimal stationary Markov controls, in addition to the existence results. We also revisit this problem under a near-monotonicity condition but without any stability hypothesis. Our results also include policy improvement algorithms both in discrete and continuous time frameworks.

التحسين والتحكم الاحتمالات

Exit Time Risk-Sensitive Control for Systems of Cooperative Agents

70 - Paul Dupuis , Vaios Laschos , 2018

We study sequences, parametrized by the number of agents, of many agent exit time stochastic control problems with risk-sensitive cost structure. We identify a fully characterizing assumption, under which each of such control problem corresponds to a risk-neutral stochastic control problem with additive cost, and sequentially to a risk-neutral stochastic control problem on the simplex, where the specific information about the state of each agent can be discarded. We also prove that, under some additional assumptions, the sequence of value functions converges to the value function of a deterministic control problem, which can be used for the design of nearly optimal controls for the original problem, when the number of agents is sufficiently large.

التحسين والتحكم الاحتمالات

On the relative value iteration with a risk-sensitive criterion

72 - Ari Arapostathis , Vivek S. Borkar 2019

A multiplicative relative value iteration algorithm for solving the dynamic programming equation for the risk-sensitive control problem is studied for discrete time controlled Markov chains with a compact Polish state space, and controlled diffusions in on the whole Euclidean space. The main result is a proof of convergence to the desired limit in each case.

التحسين والتحكم الاحتمالات

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

131 - Kaiqing Zhang , Xiangyuan Zhang , Bin Hu 2021

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy g radient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

التحسين والتحكم الذكاء الاصطناعي التعلم الآلي

Risk-sensitive Dynkin games with heterogeneous Poisson random intervention times

55 - Gechun Liang , Haodong Sun 2020

The paper solves constrained Dynkin games with risk-sensitive criteria, where two players are allowed to stop at two independent Poisson random intervention times, via the theory of backward stochastic differential equations. This generalizes the pre vious work of [Liang and Sun, Dynkin games with Poisson random intervention times, SIAM Journal on Control and Optimization, 2019] from the risk-neutral criteria and common signal times for both players to the risk-sensitive criteria and two heterogenous signal times. Furthermore, the paper establishes a connection of such constrained risk-sensitive Dynkin games with a class of stochastic differential games via Krylovs randomized stopping technique.

التحسين والتحكم الاحتمالات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On the policy improvement algorithm for ergodic risk-sensitive control

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً