ﻻ يوجد ملخص باللغة العربية
This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural network (NN) with rectified linear units (ReLU) being activation functions. We then verify that such approximators are strong enough for the MDP. To speed up the FVI, we recast the action selection problem as a two-stage stochastic programming problem, where the resulting recourse function comes from the two-layer NN. Then, the action selection problem is solved with a specialized multi-cut decomposition algorithm. More specifically, we design valid cuts by exploiting the structure of the approximated value functions to update the actions. We prove that the decomposition can find the global optimal solution in a finite number of iterations and the overall accelerated FVI is consistent. Finally, we verify the performance of the FVI algorithm via a multi-facility capacity investment problem (MCIP). A comprehensive numerical study is implemented, where the results show that the FVI is significantly accelerated without sacrificing too much in precision.
Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible. While some attention has been given to the conc
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount fa
A multiplicative relative value iteration algorithm for solving the dynamic programming equation for the risk-sensitive control problem is studied for discrete time controlled Markov chains with a compact Polish state space, and controlled diffusions
We propose an accelerated meta-algorithm, which allows to obtain accelerated methods for convex unconstrained minimization in different settings. As an application of the general scheme we propose nearly optimal methods for minimizing smooth function
Natural conditions sufficient for weak continuity of transition probabilities in belief MDPs (Markov decision processes) were established in our paper published in Mathematics of Operations Research in 2016. In particular, the transition probability