Convex Analytic Method Revisited: Further Optimality Results and Performance of Deterministic Stationary Control Policies

53 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Serdar Y\\\"uksel

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Ari Arapostathis - Serdar Yuksel

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The convex analytic method (generalized by Borkar) has proved to be a very versatile method for the study of infinite horizon average cost optimal stochastic control problems. In this paper, we revisit the convex analytic method and make three primary contributions: (i) We present an existence result, under a near-monotone cost hypothesis, for controlled Markov models that lack weak continuity of the transition kernel but are strongly continuous in the action variable for every fixed state variable. (ii) For average cost stochastic control problems in standard Borel spaces, while existing results establish the optimality of stationary (possibly randomized) policies, few results are available on the optimality of stationary deterministic policies, and these are under rather restrictive hypotheses. We provide mild conditions under which an average cost optimal stochastic control problem admits optimal solutions that are deterministic and stationary, building upon a study of strategic measures by Feinberg. (iii) We establish conditions under which the performance under stationary deterministic policies is dense in the set of performance values under randomized stationary policies.

قيم البحث

161 - Akshay Agrawal , Shane Barratt , Stephen Boyd 2019

Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) incl ude the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.

التحسين والتحكم التعلم الآلي

Local Optimality Conditions for a Class of Hidden Convex Optimization

96 - Mengmeng Song , Yong Xia , Hongying Liu 2021

Hidden convex optimization is such a class of nonconvex optimization problems that can be globally solved in polynomial time via equivalent convex programming reformulations. In this paper, we focus on checking local optimality in hidden convex optim ization. We first introduce a class of hidden convex optimization problems by jointing the classical nonconvex trust-region subproblem (TRS) with convex optimization (CO), and then present a comprehensive study on local optimality conditions. In order to guarantee the existence of a necessary and sufficient condition for local optimality, we need more restrictive assumptions. To our surprise, while (TRS) has at most one local non-global minimizer and (CO) has no local non-global minimizer, their joint problem could have more than one local non-global minimizer.

التحسين والتحكم

Optimal Deceptive and Reference Policies for Supervisory Control

45 - Mustafa O. Karabag , Melkior Ornik , Ufuk Topcu 2019

The use of deceptive strategies is important for an agent that attempts not to reveal his intentions in an adversarial environment. We consider a setting in which a supervisor provides a reference policy and expects an agent to follow the reference p olicy and perform a task. The agent may instead follow a different, deceptive policy to achieve a different task. We model the environment and the behavior of the agent with a Markov decision process, represent the tasks of the agent and the supervisor with linear temporal logic formulae, and study the synthesis of optimal deceptive policies for such agents. We also study the synthesis of optimal reference policies that prevents deceptive strategies of the agent and achieves the supervisors task with high probability. We show that the synthesis of deceptive policies has a convex optimization problem formulation, while the synthesis of reference policies requires solving a nonconvex optimization problem.

التحسين والتحكم

Modeling and Control of COVID-19 Epidemic through Testing Policies

62 - Muhammad Umar B. Niazi , Alain Kibangou , Carlos Canudas-de-Wit 2020

Testing for the infected cases is one of the most important mechanisms to control an epidemic. It enables to isolate the detected infected individuals, thereby limiting the disease transmission to the susceptible population. However, despite the sign ificance of testing policies, the recent literature on the subject lacks a control-theoretic perspective. In this work, an epidemic model that incorporates the testing rate as a control input is presented. The proposed model differentiates the undetected infected from the detected infected cases, who are assumed to be removed from the disease spreading process in the population. First, the model is estimated and validated for COVID-19 data in France. Then, two testing policies are proposed, the so-called best-effort strategy for testing (BEST) and constant optimal strategy for testing (COST). The BEST policy is a suppression strategy that provides a lower bound on the testing rate such that the epidemic switches from a spreading to a non-spreading state. The COST policy is a mitigation strategy that provides an optimal value of testing rate that minimizes the peak value of the infected population when the total stockpile of tests is limited. Both testing policies are evaluated by predicting the number of active intensive care unit (ICU) cases and the cumulative number of deaths due to COVID-19.

التحسين والتحكم أنظمة وتحكم أنظمة وتحكم

Second-order optimality conditions for non-convex set-constrained optimization problems

93 - Helmut Gfrerer , Jane Ye , Jinchuan Zhou 2019

In this paper we study second-order optimality conditions for non-convex set-constrained optimization problems. For a convex set-constrained optimization problem, it is well-known that second-order optimality conditions involve the support function o f the second-order tangent set. In this paper we propose two approaches for establishing second-order optimality conditions for the non-convex case. In the first approach we extend the concept of the support function so that it is applicable to general non-convex set-constrained problems, whereas in the second approach we introduce the notion of the directional regular tangent cone and apply classical results of convex duality theory. Besides the second-order optimality conditions, the novelty of our approach lies in the systematic introduction and use, respectively, of direction

التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الموصل

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Convex Analytic Method Revisited: Further Optimality Results and Performance of Deterministic Stationary Control Policies

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً