ترغب بنشر مسار تعليمي؟ اضغط هنا

A Linear Programming Formulation for Constrained Discounted Continuous Control for Piecewise Deterministic Markov Processes

249   0   0.0 ( 0 )
 نشر من قبل Francois Dufour
 تاريخ النشر 2014
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

This papers deals with the constrained discounted control of piecewise deterministic Markov process (PDMPs) in general Borel spaces. The control variable acts on the jump rate and transition measure, and the goal is to minimize the total expected discounted cost, composed of positive running and boundary costs, while satisfying some constraints also in this form. The basic idea is, by using the special features of the PDMPs, to re-write the problem via an embedded discrete-time Markov chain associated to the PDMP and re-formulate the problem as an infinite dimensional linear programming (LP) problem, via the occupation measures associated to the discrete-time process. It is important to stress however that our new discrete-time problem is not in the same framework of a general constrained discrete-time Markov Decision Process and, due to that, some conditions are required to get the equivalence between the continuous-time problem and the LP formulation. We provide in the sequel sufficient conditions for the solvability of the associated LP problem, based on a generalization of Theorem 4.1 in [8]. In the Appendix we present the proof of this generalization which, we believe, is of interest on its own. The paper is concluded with some examples to illustrate the obtained results.



قيم البحث

اقرأ أيضاً

166 - O.L.V. Costa , F. Dufour 2008
This paper deals with the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel space and with compact action space depending on the state variable. The control variable acts on the jump rate and transition measure of the PDMP, and the running and boundary costs are assumed to be positive but not necessarily bounded. Our first main result is to obtain an optimality equation for the long run average cost in terms of a discrete-time optimality equation related to the embedded Markov chain given by the post-jump location of the PDMP. Our second main result guarantees the existence of a feedback measurable selector for the discrete-time optimality equation by establishing a connection between this equation and an integro-differential equation. Our final main result is to obtain some sufficient conditions for the existence of a solution for a discrete-time optimality inequality and an ordinary optimal feedback control for the long run average cost using the so-called vanishing discount approach.
146 - O.L.V. Costa , F. Dufour 2008
The main goal of this paper is to derive sufficient conditions for the existence of an optimal control strategy for the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel s pace and with compact action space depending on the state variable. In order to do that we apply the so-called vanishing discount approach to obtain a solution to an average cost optimality inequality associated to the long run average cost problem. Our main assumptions are written in terms of some integro-differential inequalities related to the so-called expected growth condition, and geometric convergence of the post-jump location kernel associated to the PDMP.
153 - O.L.V. Costa , F. Dufour 2009
The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel space and with compact act ion space depending on the state variable. In order to do that we first derive some important properties for a pseudo-Poisson equation associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution satisfying the optimality equation holds under some classical hypotheses and that this optimal solution yields to an optimal control strategy for the average control problem for the continuous-time PDMP in a feedback form.
We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper se micontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of stationary randomized policies. Then, we indicate that if the initial distribution and transition probabilities are non-atomic, then using a general purification result of Feinberg and Piunovskiy, stationary optimal policies can be deterministic. Our main results are illustrated by five examples.
The objective of this work is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite-time horizon discounted cost. The continuous-time controlled process is shown to be non explosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on one hand the existence of an optimal control strategy and on the other hand the existence of an $varepsilon$-optimal control strategy. The decomposition of the state space in two disjoint subsets is exhibited where roughly speaking, one should apply a gradual action or an impulsive action correspondingly to get an optimal or $varepsilon$-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time $t=0$ and only immediately after natural jumps is a sufficient set for the control problem under consideration.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا