ترغب بنشر مسار تعليمي؟ اضغط هنا

The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes

146   0   0.0 ( 0 )
 نشر من قبل Francois Dufour
 تاريخ النشر 2009
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we first derive some important properties for a pseudo-Poisson equation associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution satisfying the optimality equation holds under some classical hypotheses and that this optimal solution yields to an optimal control strategy for the average control problem for the continuous-time PDMP in a feedback form.



قيم البحث

اقرأ أيضاً

158 - O.L.V. Costa , F. Dufour 2008
This paper deals with the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel space and with compact action space depending on the state variable. The control variable acts on the jump rate and transition measure of the PDMP, and the running and boundary costs are assumed to be positive but not necessarily bounded. Our first main result is to obtain an optimality equation for the long run average cost in terms of a discrete-time optimality equation related to the embedded Markov chain given by the post-jump location of the PDMP. Our second main result guarantees the existence of a feedback measurable selector for the discrete-time optimality equation by establishing a connection between this equation and an integro-differential equation. Our final main result is to obtain some sufficient conditions for the existence of a solution for a discrete-time optimality inequality and an ordinary optimal feedback control for the long run average cost using the so-called vanishing discount approach.
136 - O.L.V. Costa , F. Dufour 2008
The main goal of this paper is to derive sufficient conditions for the existence of an optimal control strategy for the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel s pace and with compact action space depending on the state variable. In order to do that we apply the so-called vanishing discount approach to obtain a solution to an average cost optimality inequality associated to the long run average cost problem. Our main assumptions are written in terms of some integro-differential inequalities related to the so-called expected growth condition, and geometric convergence of the post-jump location kernel associated to the PDMP.
Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision proces s (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.
We consider a general piecewise deterministic Markov process (PDMP) $X={X_t}_{tgeqslant 0}$ with measure-valued generator $mathcal{A}$, for which the conditional distribution function of the inter-occurrence time is not necessarily absolutely continu ous. A general form of the exponential martingales is presented as $$M^f_t=frac{f(X_t)}{f(X_0)}left[mathrm{Sexp}left(int_{(0,t]}frac{mathrm{d}L(mathcal{A}f)_s}{f(X_{s-})}right)right]^{-1}.$$ Using this exponential martingale as a likelihood ratio process, we define a new probability measure. It is shown that the original process remains a general PDMP under the new probability measure. And we find the new measure-valued generator and its domain.
107 - Sean D Lawley 2019
The time it takes the fastest searcher out of $Ngg1$ searchers to find a target determines the timescale of many physical, chemical, and biological processes. This time is called an extreme first passage time (FPT) and is typically much faster than t he FPT of a single searcher. Extreme FPTs of diffusion have been studied for decades, but little is known for other types of stochastic processes. In this paper, we study the distribution of extreme FPTs of piecewise deterministic Markov processes (PDMPs). PDMPs are a broad class of stochastic processes that evolve deterministically between random events. Using classical extreme value theory, we prove general theorems which yield the distribution and moments of extreme FPTs in the limit of many searchers based on the short time distribution of the FPT of a single searcher. We then apply these theorems to some canonical PDMPs, including run and tumble searchers in one, two, and three space dimensions. We discuss our results in the context of some biological systems and show how our approach accounts for an unphysical property of diffusion which can be problematic for extreme statistics.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا