Approximate Dynamic Programming via a Smoothed Linear Program

133 0 0.0 ( 0 )

Download Cite

Added by Ciamac Moallemi

Publication date 2009

fields

and research's language is English

Authors V. V. Desai - V. F. Farias - C. C. Moallemi

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural `projection of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program--the `smoothed approximate linear program--is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate substantially superior bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. Second, experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by an order of magnitude.

rate research

A General Framework for Bounding Approximate Dynamic Programming Schemes

91 - Yajing Liu , Edwin Chong , Ali Pezeshki 2018

For years, there has been interest in approximation methods for solving dynamic programming problems, because of the inherent complexity in computing optimal solutions characterized by Bellmans principle of optimality. A wide range of approximate dynamic programming (ADP) methods now exists. It is of great interest to guarantee that the performance of an ADP scheme be at least some known fraction, say $beta$, of optimal. This paper introduces a general approach to bounding the performance of ADP methods, in this sense, in the stochastic setting. The approach is based on new results for bounding greedy solutions in string optimization problems, where one has to choose a string (ordered set) of actions to maximize an objective function. This bounding technique is inspired by submodularity theory, but submodularity is not required for establishing bounds. Instead, the bounding is based on quantifying certain notions of curvature of string functions; the smaller the curvatures the better the bound. The key insight is that any ADP scheme is a greedy scheme for some surrogate string objective function that coincides in its optimal solution and value with those of the original optimal control problem. The ADP scheme then yields to the bounding technique mentioned above, and the curvatures of the surrogate objective determine the value $beta$ of the bound. The surrogate objective and its curvatures depend on the specific ADP.

Optimization and Control

Guaranteed Bounds for General Approximate Dynamic Programming

280 - Yajing Liu , Edwin K. P. Chong , Ali Pezeshki 2014

In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string-submodular functions over a finite horizon. The approach is to derive a string-submodular optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of $beta$ of the performance of the optimal control solution, which satisfies Bellmans optimality principle. The factor $beta$ depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.

Optimization and Control

Dynamic Programming and Linear Programming for Odds Problem

208 - Sachika Kurokawa , Tomomi Matsui 2021

This paper discusses the odds problem, proposed by Bruss in 2000, and its variants. A recurrence relation called a dynamic programming (DP) equation is used to find an optimal stopping policy of the odds problem and its variants. In 2013, Buchbinder, Jain, and Singh proposed a linear programming (LP) formulation for finding an optimal stopping policy of the classical secretary problem, which is a special case of the odds problem. The proposed linear programming problem, which maximizes the probability of a win, differs from the DP equations known for long time periods. This paper shows that an ordinary DP equation is a modification of the dual problem of linear programming including the LP formulation proposed by Buchbinder, Jain, and Singh.

Optimization and Control Applications

Approximate Dynamic Programming for Delivery Time Slot Pricing: a Sensitivity Analysis

88 - Denis Lebedev , Kostas Margellos , Paul Goulart 2020

We consider the revenue management problem of finding profit-maximising prices for delivery time slots in the context of attended home delivery. This multi-stage optimal control problem admits a dynamic programming formulation that is intractable for realistic problem sizes due to the so-called curse of dimensionality. Therefore, we study three approximate dynamic programming algorithms both from a control-theoretical perspective and in a parametric numerical case study. Our numerical analysis is based on real-world data, from which we generate multiple scenarios to stress-test the robustness of the pricing policies to errors in model parameter estimates. Our theoretical analysis and numerical benchmark tests show that one of these algorithms, namely gradient-bounded dynamic programming, dominates the others with respect to computation time and profit-generation capabilities of the delivery slot pricing policies that it generates. Finally, we show that uncertainty in the estimates of the model parameters further increases the profit-generation dominance of this approach.

Optimization and Control

Markov Decision Process-based Resilience Enhancement for Distribution Systems: An Approximate Dynamic Programming Approach

78 - Chong Wang , Ping Ju , Shunbo Lei 2019

Because failures in distribution systems caused by extreme weather events directly result in consumers outages, this paper proposes a state-based decision-making model with the objective of mitigating loss of load to improve the distribution system resilience throughout the unfolding events. The sequentially uncertain system states, e.g., feeder line on/off states, driven by the unfolding events are modeled as Markov states, and the probabilities from one Markov state to another Markov state throughout the unfolding events are determined by the component failure caused by the unfolding events. A recursive optimization model based on Markov decision processes (MDP) is developed to make state-based actions, i.e., system reconfiguration, at each decision time. To overcome the curse of dimensionality caused by enormous states and actions, an approximate dynamic programming (ADP) approach based on post-decision states and iteration is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.

Optimization and Control