ترغب بنشر مسار تعليمي؟ اضغط هنا

Dynamic Programming Subject to Total Variation Distance Ambiguity

250   0   0.0 ( 0 )
 نشر من قبل Ioannis Tzortzis
 تاريخ النشر 2014
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

The aim of this paper is to address optimality of stochastic control strategies via dynamic programming subject to total variation distance ambiguity on the conditional distribution of the controlled process. We formulate the stochastic control problem using minimax theory, in which the control minimizes the pay-off while the conditional distribution, from the total variation distance set, maximizes it. First, we investigate the maximization of a linear functional on the space of probability measures on abstract spaces, among those probability measures which are within a total variation distance from a nominal probability measure, and then we give the maximizing probability measure in closed form. Second, we utilize the solution of the maximization to solve minimax stochastic control with deterministic control strategies, under a Markovian and a non-Markovian assumption, on the conditional distributions of the controlled process. The results of this part include: 1) Minimax optimization subject to total variation distance ambiguity constraint; 2) new dynamic programming recursions, which involve the oscillator seminorm of the value function, in addition to the standard terms; 3) new infinite horizon discounted dynamic programming equation, the associated contractive property, and a new policy iteration algorithm. Finally, we provide illustrative examples for both the finite and infinite horizon cases. For the infinite horizon case we invoke the new policy iteration algorithm to compute the optimal strategies.

قيم البحث

اقرأ أيضاً

We analyze the infinite horizon minimax average cost Markov Control Model (MCM), for a class of controlled process conditional distributions, which belong to a ball, with respect to total variation distance metric, centered at a known nominal control led conditional distribution with radius $Rin [0,2]$, in which the minimization is over the control strategies and the maximization is over conditional distributions. Upon performing the maximization, a dynamic programming equation is obtained which includes, in addition to the standard terms, the oscillator semi-norm of the cost-to-go. First, the dynamic programming equation is analyzed for finite state and control spaces. We show that if the nominal controlled process distribution is irreducible, then for every stationary Markov control policy the maximizing conditional distribution of the controlled process is also irreducible for $R in [0,R_{max}]$. Second, the generalized dynamic programming is analyzed for Borel spaces. We derive necessary and sufficient conditions for any control strategy to be optimal. Through our analysis, new dynamic programming equations and new policy iteration algorithms are derived. The main feature of the new policy iteration algorithms (which are applied for finite alphabet spaces) is that the policy evaluation and policy improvement steps are performed by using the maximizing conditional distribution, which is obtained via a water filling solution. Finally, the application of the new dynamic programming equations and the corresponding policy iteration algorithms are shown via illustrative examples.
105 - Sven Leyffer , Paul Manns 2021
We propose a trust-region method that solves a sequence of linear integer programs to tackle integer optimal control problems regularized with a total variation penalty. The total variation penalty allows us to prove the existence of minimizers of the integer optimal control problem. We introduce a local optimality concept for the problem, which arises from the infinite-dimensional perspective. In the case of a one-dimensional domain of the control function, we prove convergence of the iterates produced by our algorithm to points that satisfy first-order stationarity conditions for local optimality. We demonstrate the theoretical findings on a computational example.
We study the synthesis of a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterio n and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy.
124 - Guodong Xu , Yu Xia , Hui Ji 2018
Data clustering is a fundamental problem with a wide range of applications. Standard methods, eg the $k$-means method, usually require solving a non-convex optimization problem. Recently, total variation based convex relaxation to the $k$-means model has emerged as an attractive alternative for data clustering. However, the existing results on its exact clustering property, ie, the condition imposed on data so that the method can provably give correct identification of all cluster memberships, is only applicable to very specific data and is also much more restrictive than that of some other methods. This paper aims at the revisit of total variation based convex clustering, by proposing a weighted sum-of-$ell_1$-norm relating convex model. Its exact clustering property established in this paper, in both deterministic and probabilistic context, is applicable to general data and is much sharper than the existing results. These results provided good insights to advance the research on convex clustering. Moreover, the experiments also demonstrated that the proposed convex model has better empirical performance when be compared to standard clustering methods, and thus it can see its potential in practice.
69 - Yiwei Qiu 2020
Continuous-time random disturbances (also called stochastic excitations) due to increasing renewable generation have an increasing impact on power system dynamics; However, except from the Monte Carlo simulation, most existing methods for quantifying this impact are intrusive, meaning they are not based on commercial simulation software and hence are difficult to use for power utility companies. To fill this gap, this paper proposes an efficient and nonintrusive method for quantifying uncertainty in dynamic power systems subject to stochastic excitations. First, the Gaussian or non-Gaussian stochastic excitations are modeled with an It^{o} process as stochastic differential equations. Then, the It^{o} process is spectrally represented by independent Gaussian random parameters, which enables the polynomial chaos expansion (PCE) of the system dynamic response to be calculated via an adaptive sparse probabilistic collocation method. Finally, the probability distribution and the high-order moments of the system dynamic response and performance index are accurately and efficiently quantified. The proposed nonintrusive method is based on commercial simulation software such as PSS/E with carefully designed input signals, which ensures ease of use for power utility companies. The proposed method is validated via case studies of IEEE 39-bus and 118-bus test systems.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا