No Arabic abstract
We consider the revenue management problem of finding profit-maximising prices for delivery time slots in the context of attended home delivery. This multi-stage optimal control problem admits a dynamic programming formulation that is intractable for realistic problem sizes due to the so-called curse of dimensionality. Therefore, we study three approximate dynamic programming algorithms both from a control-theoretical perspective and in a parametric numerical case study. Our numerical analysis is based on real-world data, from which we generate multiple scenarios to stress-test the robustness of the pricing policies to errors in model parameter estimates. Our theoretical analysis and numerical benchmark tests show that one of these algorithms, namely gradient-bounded dynamic programming, dominates the others with respect to computation time and profit-generation capabilities of the delivery slot pricing policies that it generates. Finally, we show that uncertainty in the estimates of the model parameters further increases the profit-generation dominance of this approach.
In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string-submodular functions over a finite horizon. The approach is to derive a string-submodular optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of $beta$ of the performance of the optimal control solution, which satisfies Bellmans optimality principle. The factor $beta$ depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.
For years, there has been interest in approximation methods for solving dynamic programming problems, because of the inherent complexity in computing optimal solutions characterized by Bellmans principle of optimality. A wide range of approximate dynamic programming (ADP) methods now exists. It is of great interest to guarantee that the performance of an ADP scheme be at least some known fraction, say $beta$, of optimal. This paper introduces a general approach to bounding the performance of ADP methods, in this sense, in the stochastic setting. The approach is based on new results for bounding greedy solutions in string optimization problems, where one has to choose a string (ordered set) of actions to maximize an objective function. This bounding technique is inspired by submodularity theory, but submodularity is not required for establishing bounds. Instead, the bounding is based on quantifying certain notions of curvature of string functions; the smaller the curvatures the better the bound. The key insight is that any ADP scheme is a greedy scheme for some surrogate string objective function that coincides in its optimal solution and value with those of the original optimal control problem. The ADP scheme then yields to the bounding technique mentioned above, and the curvatures of the surrogate objective determine the value $beta$ of the bound. The surrogate objective and its curvatures depend on the specific ADP.
We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural `projection of a well studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program--the `smoothed approximate linear program--is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate substantially superior bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. Second, experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by an order of magnitude.
On-demand delivery has become increasingly popular around the world. Brick-and-mortar grocery stores, restaurants, and pharmacies are providing fast delivery services to satisfy the growing home delivery demand. Motivated by a large meal and grocery delivery company, we model and solve a multiperiod driver dispatching and routing problem for last-mile delivery systems where on-time performance is the main target. The operator of this system needs to dispatch a set of drivers and specify their delivery routes in a stochastic environment, in which random demand arrives over a fixed number of periods. The resulting dynamic program is challenging to solve due to the curse of dimensionality. We propose a novel approximation framework to approximate the value function via a simplified dispatching program. We then develop efficient exact algorithms for this problem based on Benders decomposition and column generation. We validate the superior performance of our framework and algorithms via extensive numerical experiments. Tested on a real-world data set, we quantify the value of adaptive dispatching and routing in on-time delivery and highlight the need of coordinating these two decisions in a dynamic setting. We show that dispatching multiple vehicles with short trips is preferable for on-time delivery, as opposed to sending a few vehicles with long travel times.
Because failures in distribution systems caused by extreme weather events directly result in consumers outages, this paper proposes a state-based decision-making model with the objective of mitigating loss of load to improve the distribution system resilience throughout the unfolding events. The sequentially uncertain system states, e.g., feeder line on/off states, driven by the unfolding events are modeled as Markov states, and the probabilities from one Markov state to another Markov state throughout the unfolding events are determined by the component failure caused by the unfolding events. A recursive optimization model based on Markov decision processes (MDP) is developed to make state-based actions, i.e., system reconfiguration, at each decision time. To overcome the curse of dimensionality caused by enormous states and actions, an approximate dynamic programming (ADP) approach based on post-decision states and iteration is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.