ترغب بنشر مسار تعليمي؟ اضغط هنا

Approximate Dynamic Programming via a Smoothed Linear Program

127   0   0.0 ( 0 )
 نشر من قبل Ciamac Moallemi
 تاريخ النشر 2009
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural `projection of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program--the `smoothed approximate linear program--is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate substantially superior bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. Second, experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by an order of magnitude.



قيم البحث

اقرأ أيضاً

For years, there has been interest in approximation methods for solving dynamic programming problems, because of the inherent complexity in computing optimal solutions characterized by Bellmans principle of optimality. A wide range of approximate dyn amic programming (ADP) methods now exists. It is of great interest to guarantee that the performance of an ADP scheme be at least some known fraction, say $beta$, of optimal. This paper introduces a general approach to bounding the performance of ADP methods, in this sense, in the stochastic setting. The approach is based on new results for bounding greedy solutions in string optimization problems, where one has to choose a string (ordered set) of actions to maximize an objective function. This bounding technique is inspired by submodularity theory, but submodularity is not required for establishing bounds. Instead, the bounding is based on quantifying certain notions of curvature of string functions; the smaller the curvatures the better the bound. The key insight is that any ADP scheme is a greedy scheme for some surrogate string objective function that coincides in its optimal solution and value with those of the original optimal control problem. The ADP scheme then yields to the bounding technique mentioned above, and the curvatures of the surrogate objective determine the value $beta$ of the bound. The surrogate objective and its curvatures depend on the specific ADP.
In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy str ategies in optimization of string-submodular functions over a finite horizon. The approach is to derive a string-submodular optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of $beta$ of the performance of the optimal control solution, which satisfies Bellmans optimality principle. The factor $beta$ depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.
This paper discusses the odds problem, proposed by Bruss in 2000, and its variants. A recurrence relation called a dynamic programming (DP) equation is used to find an optimal stopping policy of the odds problem and its variants. In 2013, Buchbinder, Jain, and Singh proposed a linear programming (LP) formulation for finding an optimal stopping policy of the classical secretary problem, which is a special case of the odds problem. The proposed linear programming problem, which maximizes the probability of a win, differs from the DP equations known for long time periods. This paper shows that an ordinary DP equation is a modification of the dual problem of linear programming including the LP formulation proposed by Buchbinder, Jain, and Singh.
We consider the revenue management problem of finding profit-maximising prices for delivery time slots in the context of attended home delivery. This multi-stage optimal control problem admits a dynamic programming formulation that is intractable for realistic problem sizes due to the so-called curse of dimensionality. Therefore, we study three approximate dynamic programming algorithms both from a control-theoretical perspective and in a parametric numerical case study. Our numerical analysis is based on real-world data, from which we generate multiple scenarios to stress-test the robustness of the pricing policies to errors in model parameter estimates. Our theoretical analysis and numerical benchmark tests show that one of these algorithms, namely gradient-bounded dynamic programming, dominates the others with respect to computation time and profit-generation capabilities of the delivery slot pricing policies that it generates. Finally, we show that uncertainty in the estimates of the model parameters further increases the profit-generation dominance of this approach.
78 - Chong Wang , Ping Ju , Shunbo Lei 2019
Because failures in distribution systems caused by extreme weather events directly result in consumers outages, this paper proposes a state-based decision-making model with the objective of mitigating loss of load to improve the distribution system r esilience throughout the unfolding events. The sequentially uncertain system states, e.g., feeder line on/off states, driven by the unfolding events are modeled as Markov states, and the probabilities from one Markov state to another Markov state throughout the unfolding events are determined by the component failure caused by the unfolding events. A recursive optimization model based on Markov decision processes (MDP) is developed to make state-based actions, i.e., system reconfiguration, at each decision time. To overcome the curse of dimensionality caused by enormous states and actions, an approximate dynamic programming (ADP) approach based on post-decision states and iteration is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا