Solving stochastic optimal control problem via stochastic maximum principle with deep learning method

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ying Peng

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shaolin Ji - Shige Peng - Ying Peng

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we aim to solve the high dimensional stochastic optimal control problem from the view of the stochastic maximum principle via deep learning. By introducing the extended Hamiltonian system which is essentially an FBSDE with a maximum condition, we reformulate the original control problem as a new one. Three algorithms are proposed to solve the new control problem. Numerical results for different examples demonstrate the effectiveness of our proposed algorithms, especially in high dimensional cases. And an important application of this method is to calculate the sub-linear expectations, which correspond to a kind of fully nonlinear PDEs.

قيم البحث

100 - Na Li , Xun Li , Jing Peng 2020

This paper applies a reinforcement learning (RL) method to solve infinite horizon continuous-time stochastic linear quadratic problems, where drift and diffusion terms in the dynamics may depend on both the state and control. Based on Bellmans dynami c programming principle, an online RL algorithm is presented to attain the optimal control with just partial system information. This algorithm directly computes the optimal control rather than estimating the system coefficients and solving the related Riccati equation. It just requires local trajectory information, greatly simplifying the calculation processing. Two numerical examples are carried out to shed light on our theoretical findings.

التحسين والتحكم

Maximum Principle for Optimal Control of Neutral Stochastic Functional Differential Systems

171 - Wenning Wei 2013

In this paper, the optimal control problem of neutral stochastic functional differential equation (NSFDE) is discussed. A class of so-called neutral backward stochastic functional equations of Volterra type (VNBSFEs) are introduced as the adjoint equ ation. The existence and uniqueness of VNBSFE is established. The Pontryagin maximum principle is constructed for controlled NSFDE with Lagrange type cost functional.

التحسين والتحكم

The Global Maximum Principle for Progressive Optimal Control of Partially Observed Forward-Backward Stochastic Systems with Random Jumps

105 - Yueyang Zheng , Jingtao Shi 2021

In this paper, we study a partially observed progressive optimal control problem of forward-backward stochastic differential equations with random jumps, where the control domain is not necessarily convex, and the control variable enter into all the coefficients. In our model, the observation equation is not only driven by a Brownian motion but also a Poisson random measure, which also have correlated noises with the state equation. For preparation, we first derive the existence and uniqueness of the solutions to the fully coupled forward-backward stochastic system with random jumps in $L^2$-space and the decoupled forward-backward stochastic system with random jumps in $L^beta(beta>2)$-space, respectively, then we obtain the $L^beta(betageq2)$-estimation of solutions to the fully coupled forward-backward stochastic system, and the non-linear filtering equation of partially observed stochastic system with random jumps. Then we derive the partially observed global maximum principle with random jumps with a new hierarchical method. To show its applications, a partially observed linear quadratic progressive optimal control problem with random jumps is investigated, by the maximum principle and stochastic filtering. State estimate feedback representation of the optimal control is given in a more explicit form by introducing some ordinary differential equations.

التحسين والتحكم

Sparse optimal stochastic control

152 - Kaito Ito , Takuya Ikeda , Kenji Kashima 2021

In this paper, we investigate a sparse optimal control of continuous-time stochastic systems. We adopt the dynamic programming approach and analyze the optimal control via the value function. Due to the non-smoothness of the $L^0$ cost functional, in general, the value function is not differentiable in the domain. Then, we characterize the value function as a viscosity solution to the associated Hamilton-Jacobi-Bellman (HJB) equation. Based on the result, we derive a necessary and sufficient condition for the $L^0$ optimality, which immediately gives the optimal feedback map. Especially for control-affine systems, we consider the relationship with $L^1$ optimal control problem and show an equivalence theorem.

التحسين والتحكم أنظمة وتحكم أنظمة وتحكم

Risk aware minimum principle for optimal control of stochastic differential equations

108 - Jukka Isohatala , William B. Haskell 2018

We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk functi on, a nonlinear functional of random variables that account for the controllers risk preferences. We state and prove a risk aware minimum principle that is a parsimonious generalization of the well-known risk neutral, stochastic Pontryagins minimum principle. As our main results we give necessary and also sufficient conditions for optimality of control processes taking values on probability measures defined on a given action space. We show that remarkably, going from the risk neutral to the risk aware case, the minimum principle is simply modified by the introduction of one additional real-valued stochastic process that acts as a risk adjustment factor for given cost rate and terminal cost functions. This adjustment process is explicitly given as the expectation, conditional on the filtration at the given time, of an appropriately defined functional derivative of the risk function evaluated at the random total cost. For our results we rely on the Frechet differentiability of the risk function, and for completeness, we prove under mild assumptions the existence of Frechet derivatives of some common risk functions. We give a simple application of the results for a portfolio allocation problem and show that the risk awareness of the objective function gives rise to a risk premium term that is characterized by the risk adjustment process described above. This suggests uses of our results in e.g. pricing of risk modeled by generic risk functions in financial applications.

التحسين والتحكم