ﻻ يوجد ملخص باللغة العربية
In this paper, we will develop a systematic approach to deriving guaranteed bounds for approximate dynamic programming (ADP) schemes in optimal control problems. Our approach is inspired by our recent results on bounding the performance of greedy strategies in optimization of string-submodular functions over a finite horizon. The approach is to derive a string-submodular optimization problem, for which the optimal strategy is the optimal control solution and the greedy strategy is the ADP solution. Using this approach, we show that any ADP solution achieves a performance that is at least a factor of $beta$ of the performance of the optimal control solution, which satisfies Bellmans optimality principle. The factor $beta$ depends on the specific ADP scheme, as we will explicitly characterize. To illustrate the applicability of our bounding technique, we present examples of ADP schemes, including the popular rollout method.
For years, there has been interest in approximation methods for solving dynamic programming problems, because of the inherent complexity in computing optimal solutions characterized by Bellmans principle of optimality. A wide range of approximate dyn
We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural `projection of a well studied l
We consider the revenue management problem of finding profit-maximising prices for delivery time slots in the context of attended home delivery. This multi-stage optimal control problem admits a dynamic programming formulation that is intractable for
In this paper, we are concerned with the valuation of Guaranteed Annuity Options (GAOs) under the most generalised modelling framework where both interest and mortality rates are stochastic and correlated. Pricing these type of options in the correla
We are concerned with the linear-quadratic optimal stochastic control problem with random coefficients. Under suitable conditions, we prove that the value field $V(t,x,omega), (t,x,omega)in [0,T]times R^ntimes Omega$, is quadratic in $x$, and has the