No Arabic abstract
Natural conditions sufficient for weak continuity of transition probabilities in belief MDPs (Markov decision processes) were established in our paper published in Mathematics of Operations Research in 2016. In particular, the transition probability in the belief MDP is weakly continuous if in the original MDP the transition probability is weakly continuous and the observation probability is continuous in total variation. These results imply sufficient conditions for the existence of optimal policies in POMDPs (partially observable MDPs) and provide computational methods for finding them. Recently Kara, Saldi, and Yuksel proved weak continuity of the transition probability for the belief MDP if the transition probability for the original MDP is continuous in total variation and the observation probability does not depend on controls. In this paper we show that the following two conditions imply weak continuity of transition probabilities for belief MDPs when observation probabilities depend on controls: (i) transition probabilities for the original MDP are continuous in total variation, and (ii) observation probabilities are measurable, and their dependence on controls is continuous in total variation.
This paper describes the structure of optimal policies for infinite-state Markov Decision Processes with setwise continuous transition probabilities. The action sets may be noncompact. The objective criteria are either the expected total discounted and undiscounted costs or average costs per unit time. The analysis of optimality equations and inequalities is based on the optimal selection theorem for inf-compact functions introduced in this paper.
This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural network (NN) with rectified linear units (ReLU) being activation functions. We then verify that such approximators are strong enough for the MDP. To speed up the FVI, we recast the action selection problem as a two-stage stochastic programming problem, where the resulting recourse function comes from the two-layer NN. Then, the action selection problem is solved with a specialized multi-cut decomposition algorithm. More specifically, we design valid cuts by exploiting the structure of the approximated value functions to update the actions. We prove that the decomposition can find the global optimal solution in a finite number of iterations and the overall accelerated FVI is consistent. Finally, we verify the performance of the FVI algorithm via a multi-facility capacity investment problem (MCIP). A comprehensive numerical study is implemented, where the results show that the FVI is significantly accelerated without sacrificing too much in precision.
This paper deals with control of partially observable discrete-time stochastic systems. It introduces and studies the class of Markov Decision Processes with Incomplete information and with semi-uniform Feller transition probabilities. The important feature of this class of models is that the classic reduction of such a model with incomplete observation to the completely observable Markov Decision Process with belief states preserves semi-uniform Feller continuity of transition probabilities. Under mild assumptions on cost functions, optimal policies exist, optimality equations hold, and value iterations converge to optimal values for this class of models. In particular, for Partially Observable Markov Decision Processes the results of this paper imply new and generalize several known sufficient conditions on transition and observation probabilities for the existence of optimal policies, validity of optimality equations, and convergence of value iterations.
This paper studies average-cost Markov decision processes with semi-uniform Feller transition probabilities. This class of MDPs was recently introduced by the authors to study MDPs with incomplete information. This paper studies the validity of optimality inequalities, the existence of optimal policies, and the approximations of optimal policies by policies optimizing total discounted costs.
Meirowitz [17] showed existence of continuous behavioural function equilibria for Bayesian games with non-finite type and action spaces. A key condition for the proof of the existence result is equi-continuity of behavioural functions which, according to Meirowitz [17, page 215], is likely to fail or difficult to verify. In this paper, we advance the research by presenting some verifiable conditions for the required equi-continuity, namely some growth conditions of the expected utility functions of each player at equilibria. In the case when the growth is of second order, we demonstrate that the condition is guaranteed by strong concavity of the utility function. Moreover, by using recent research on polynomial decision rules and optimal discretization approaches in stochastic and robust optimization, we propose some approximation schemes for the Bayesian equilibrium problem: first, by restricting the behavioral functions to polynomial functions of certain order over the space of types, we demonstrate that solving a Bayesian polynomial behavioural function equilibrium is down to solving a finite dimensional stochastic equilibrium problem; second, we apply the optimal quantization method due to Pflug and Pichler [18] to develop an effective discretization scheme for solving the latter. Error bounds are derived for the respective approximation schemes under moderate conditions and both aca- demic examples and numerical results are presented to explain the Bayesian equilibrium problem and their approximation schemes.