Markov Decision Process-based Resilience Enhancement for Distribution Systems: An Approximate Dynamic Programming Approach

79 0 0.0 ( 0 )

Download Cite

Added by Chong Wang

Publication date 2019

fields

and research's language is English

Authors Chong Wang - Ping Ju - Shunbo Lei

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Because failures in distribution systems caused by extreme weather events directly result in consumers outages, this paper proposes a state-based decision-making model with the objective of mitigating loss of load to improve the distribution system resilience throughout the unfolding events. The sequentially uncertain system states, e.g., feeder line on/off states, driven by the unfolding events are modeled as Markov states, and the probabilities from one Markov state to another Markov state throughout the unfolding events are determined by the component failure caused by the unfolding events. A recursive optimization model based on Markov decision processes (MDP) is developed to make state-based actions, i.e., system reconfiguration, at each decision time. To overcome the curse of dimensionality caused by enormous states and actions, an approximate dynamic programming (ADP) approach based on post-decision states and iteration is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.

rate research

Dynamic Virtual Machine Management via Approximate Markov Decision Process

87 - Zhenhua Han , Haisheng Tan , Guihai Chen 2016

Efficient virtual machine (VM) management can dramatically reduce energy consumption in data centers. Existing VM management algorithms fall into two categories based on whether the VMs resource demands are assumed to be static or dynamic. The former category fails to maximize the resource utilization as they cannot adapt to the dynamic nature of VMs resource demands. Most approaches in the latter category are heuristical and lack theoretical performance guarantees. In this work, we formulate dynamic VM management as a large-scale Markov Decision Process (MDP) problem and derive an optimal solution. Our analysis of real-world data traces supports our choice of the modeling approach. However, solving the large-scale MDP problem suffers from the curse of dimensionality. Therefore, we further exploit the special structure of the problem and propose an approximate MDP-based dynamic VM management method, called MadVM. We prove the convergence of MadVM and analyze the bound of its approximation error. Moreover, MadVM can be implemented in a distributed system, which should suit the needs of real data centers. Extensive simulations based on two real-world workload traces show that MadVM achieves significant performance gains over two existing baseline approaches in power consumption, resource shortage and the number of VM migrations. Specifically, the more intensely the resource demands fluctuate, the more MadVM outperforms.

Networking and Internet Architecture

Guaranteed Bounds for General Approximate Dynamic Programming

587 - Yajing Liu , Edwin K. P. Chong , Ali Pezeshki 2014

In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string-submodular functions over a finite horizon. The approach is to derive a string-submodular optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of $beta$ of the performance of the optimal control solution, which satisfies Bellmans optimality principle. The factor $beta$ depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.

Optimization and Control

An Optimal-Storage Approach to Semidefinite Programming using Approximate Complementarity

115 - Lijun Ding , Alp Yurtsever , Volkan Cevher 2019

This paper develops a new storage-optimal algorithm that provably solves generic semidefinite programs (SDPs) in standard form. This method is particularly effective for weakly constrained SDPs. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace with small eigenvalues of the dual slack matrix. For weakly constrained SDPs, this eigenspace has very low dimension, so this observation significantly reduces the search space for the primal solution. This result suggests an algorithmic strategy that can be implemented with minimal storage: (1) Solve the dual SDP approximately; (2) compress the primal SDP to the eigenspace with small eigenvalues of the dual slack matrix; (3) solve the compressed primal SDP. The paper also provides numerical experiments showing that this approach is successful for a range of interesting large-scale SDPs.

Optimization and Control Machine Learning

A General Framework for Bounding Approximate Dynamic Programming Schemes

91 - Yajing Liu , Edwin Chong , Ali Pezeshki 2018

For years, there has been interest in approximation methods for solving dynamic programming problems, because of the inherent complexity in computing optimal solutions characterized by Bellmans principle of optimality. A wide range of approximate dynamic programming (ADP) methods now exists. It is of great interest to guarantee that the performance of an ADP scheme be at least some known fraction, say $beta$, of optimal. This paper introduces a general approach to bounding the performance of ADP methods, in this sense, in the stochastic setting. The approach is based on new results for bounding greedy solutions in string optimization problems, where one has to choose a string (ordered set) of actions to maximize an objective function. This bounding technique is inspired by submodularity theory, but submodularity is not required for establishing bounds. Instead, the bounding is based on quantifying certain notions of curvature of string functions; the smaller the curvatures the better the bound. The key insight is that any ADP scheme is a greedy scheme for some surrogate string objective function that coincides in its optimal solution and value with those of the original optimal control problem. The ADP scheme then yields to the bounding technique mentioned above, and the curvatures of the surrogate objective determine the value $beta$ of the bound. The surrogate objective and its curvatures depend on the specific ADP.

Optimization and Control

Convergence Analysis of the Approximate Newton Method for Markov Decision Processes

347 - Thomas Furmston , Guy Lever 2013

Recently two approximate Newton methods were proposed for the optimisation of Markov Decision Processes. While these methods were shown to have desirable properties, such as a guarantee that the preconditioner is negative-semidefinite when the policy is $log$-concave with respect to the policy parameters, and were demonstrated to have strong empirical performance in challenging domains, such as the game of Tetris, no convergence analysis was provided. The purpose of this paper is to provide such an analysis. We start by providing a detailed analysis of the Hessian of a Markov Decision Process, which is formed of a negative-semidefinite component, a positive-semidefinite component and a remainder term. The first part of our analysis details how the negative-semidefinite and positive-semidefinite components relate to each other, and how these two terms contribute to the Hessian. The next part of our analysis shows that under certain conditions, relating to the richness of the policy class, the remainder term in the Hessian vanishes in the vicinity of a local optimum. Finally, we bound the behaviour of this remainder term in terms of the mixing time of the Markov chain induced by the policy parameters, where this part of the analysis is applicable over the entire parameter space. Given this analysis of the Hessian we then provide our local convergence analysis of the approximate Newton framework.

Optimization and Control