بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Linear and dynamic programs for risk-sensitive cost minimization

129 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ari Arapostathis

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Ari Arapostathis - Vivek S. Borkar

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We derive equivalent linear and dynamic programs for infinite horizon risk-sensitive control for minimization of the asymptotic growth rate of the cumulative cost.

قيم البحث

68 - Junyi Liu , Ying Cui , Jong-Shi Pang 2020

This paper studies a structured compound stochastic program (SP) involving multiple expectations coupled by nonconvex and nonsmooth functions. We present a successive convex-programming based sampling algorithm and establish its subsequential converg ence. We describe stationarity properties of the limit points for several classes of the compound SP. We further discuss probabilistic stopping rules based on the computable error-bound for the algorithm. We present several risk measure minimization problems that can be formulated as such a compound stochastic program; these include generalized deviation optimization problems based on optimized certainty equivalent and buffered probability of exceedance (bPOE), a distributionally robust bPOE optimization problem, and a multiclass classification problem employing the cost-sensitive error criteria with bPOE risk measure.

التحسين والتحكم

Infinite-horizon Risk-constrained Linear Quadratic Regulator with Average Cost

366 - Feiran Zhao , Keyou You , Tamer Basar 2021

The behaviour of a stochastic dynamical system may be largely influenced by those low-probability, yet extreme events. To address such occurrences, this paper proposes an infinite-horizon risk-constrained Linear Quadratic Regulator (LQR) framework wi th time-average cost. In addition to the standard LQR objective, the average one-stage predictive variance of the state penalty is constrained to lie within a user-specified level. By leveraging the duality, its optimal solution is first shown to be stationary and affine in the state, i.e., $u(x,lambda^*) = -K(lambda^*)x + l(lambda^*)$, where $lambda^*$ is an optimal multiplier, used to address the risk constraint. Then, we establish the stability of the resulting closed-loop system. Furthermore, we propose a primal-dual method with sublinear convergence rate to find an optimal policy $u(x,lambda^*)$. Finally, a numerical example is provided to demonstrate the effectiveness of the proposed framework and the primal-dual method.

التحسين والتحكم أنظمة وتحكم أنظمة وتحكم

Compact linear programs for 2SAT

208 - David Avis , Hans Raj Tiwary 2017

For each integer $n$ we present an explicit formulation of a compact linear program, with $O(n^3)$ variables and constraints, which determines the satisfiability of any 2SAT formula with $n$ boolean variables by a single linear optimization. This con trasts with the fact that the natural polytope for this problem, formed from the convex hull of all satisfiable formulas and their satisfying assignments, has superpolynomial extension complexity. Our formulation is based on multicommodity flows. We also discuss connections of these results to the stable matching problem.

التحسين والتحكم الرياضيات المتقطعة بنى وهياكل البيانات والخوارزميات

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

131 - Kaiqing Zhang , Xiangyuan Zhang , Bin Hu 2021

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy g radient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

التحسين والتحكم الذكاء الاصطناعي التعلم الآلي

On the policy improvement algorithm for ergodic risk-sensitive control

82 - Ari Arapostathis , Anup Biswas , 2019

In this article we consider the ergodic risk-sensitive control problem for a large class of multidimensional controlled diffusions on the whole space. We study the minimization and maximization problems under either a blanket stability hypothesis, or a near-monotone assumption on the running cost. We establish the convergence of the policy improvement algorithm for these models. We also present a more general result concerning the region of attraction of the equilibrium of the algorithm.

التحسين والتحكم الاحتمالات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة البعث

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Linear and dynamic programs for risk-sensitive cost minimization

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

We derive equivalent linear and dynamic programs for infinite horizon risk-sensitive control for minimization of the asymptotic growth rate of the cumulative cost.

اقرأ أيضاً