No Arabic abstract
This paper presents machine learning techniques and deep reinforcement learningbased algorithms for the efficient resolution of nonlinear partial differential equations and dynamic optimization problems arising in investment decisions and derivative pricing in financial engineering. We survey recent results in the literature, present new developments, notably in the fully nonlinear case, and compare the different schemes illustrated by numerical tests on various financial applications. We conclude by highlighting some future research directions.
We propose a numerical method for solving high dimensional fully nonlinear partial differential equations (PDEs). Our algorithm estimates simultaneously by backward time induction the solution and its gradient by multi-layer neural networks, while the Hessian is approximated by automatic differentiation of the gradient at previous step. This methodology extends to the fully nonlinear case the approach recently proposed in cite{HPW19} for semi-linear PDEs. Numerical tests illustrate the performance and accuracy of our method on several examples in high dimension with nonlinearity on the Hessian term including a linear quadratic control problem with control on the diffusion coefficient, Monge-Amp{`e}re equation and Hamilton-Jacobi-Bellman equation in portfolio optimization.
In a market with transaction costs, the price of a derivative can be expressed in terms of (preconsistent) price systems (after Kusuoka (1995)). In this paper, we consider a market with binomial model for stock price and discuss how to generate the price systems. From this, the price formula of a derivative can be reformulated as a stochastic control problem. Then the dynamic programming approach can be used to calculate the price. We also discuss optimization of expected utility using price systems.
We propose to take advantage of the common knowledge of the characteristic function of the swap rate process as modelled in the LIBOR Market Model with Stochastic Volatility and Displaced Diffusion (DDSVLMM) to derive analytical expressions of the gradient of swaptions prices with respect to the model parameters. We use this result to derive an efficient calibration method for the DDSVLMM using gradient-based optimization algorithms. Our study relies on and extends the work by (Cui et al., 2017) that developed the analytical gradient for fast calibration of the Heston model, based on an alternative formulation of the Heston moment generating function proposed by (del Ba{~n}o et al., 2010). Our main conclusion is that the analytical gradient-based calibration is highly competitive for the DDSVLMM, as it significantly limits the number of steps in the optimization algorithm while improving its accuracy. The efficiency of this novel approach is compared to classical standard optimization procedures.
A new definition of continuous-time equilibrium controls is introduced. As opposed to the standard definition, which involves a derivative-type operation, the new definition parallels how a discrete-time equilibrium is defined, and allows for unambiguous economic interpretation. The terms strong equilibria and weak equilibria are coined for controls under the new and the standard definitions, respectively. When the state process is a time-homogeneous continuous-time Markov chain, a careful asymptotic analysis gives complete characterizations of weak and strong equilibria. Thanks to Kakutani-Fans fixed-point theorem, general existence of weak and strong equilibria is also established, under additional compactness assumption. Our theoretic results are applied to a two-state model under non-exponential discounting. In particular, we demonstrate explicitly that there can be incentive to deviate from a weak equilibrium, which justifies the need for strong equilibria. Our analysis also provides new results for the existence and characterization of discrete-time equilibria under infinite horizon.
This paper develops algorithms for high-dimensional stochastic control problems based on deep learning and dynamic programming. Unlike classical approximate dynamic programming approaches, we first approximate the optimal policy by means of neural networks in the spirit of deep reinforcement learning, and then the value function by Monte Carlo regression. This is achieved in the dynamic programming recursion by performance or hybrid iteration, and regress now methods from numerical probabilities. We provide a theoretical justification of these algorithms. Consistency and rate of convergence for the control and value function estimates are analyzed and expressed in terms of the universal approximation error of the neural networks, and of the statistical error when estimating network function, leaving aside the optimization error. Numerical results on various applications are presented in a companion paper (arxiv.org/abs/1812.05916) and illustrate the performance of the proposed algorithms.