ﻻ يوجد ملخص باللغة العربية
The objective of this work is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite-time horizon discounted cost. The continuous-time controlled process is shown to be non explosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on one hand the existence of an optimal control strategy and on the other hand the existence of an $varepsilon$-optimal control strategy. The decomposition of the state space in two disjoint subsets is exhibited where roughly speaking, one should apply a gradual action or an impulsive action correspondingly to get an optimal or $varepsilon$-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time $t=0$ and only immediately after natural jumps is a sufficient set for the control problem under consideration.
This paper extends to Continuous-Time Jump Markov Decision Processes (CTJMDP) the classic result for Markov Decision Processes stating that, for a given initial state distribution, for every policy there is a (randomized) Markov policy, which can be
In a variety of applications, an agents success depends on the knowledge that an adversarial observer has or can gather about the agents decisions. It is therefore desirable for the agent to achieve a task while reducing the ability of an observer to
This paper describes the structure of solutions to Kolmogorovs equations for nonhomogeneous jump Markov processes and applications of these results to control of jump stochastic systems. These equations were studied by Feller (1940), who clarified in
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agents t
This papers deals with the constrained discounted control of piecewise deterministic Markov process (PDMPs) in general Borel spaces. The control variable acts on the jump rate and transition measure, and the goal is to minimize the total expected dis