ترغب بنشر مسار تعليمي؟ اضغط هنا

Deep neural networks algorithms for stochastic control problems on finite horizon: convergence analysis

81   0   0.0 ( 0 )
 نشر من قبل Nicolas Langren\\'e
 تاريخ النشر 2018
  مجال البحث
والبحث باللغة English
 تأليف C^ome Hure




اسأل ChatGPT حول البحث

This paper develops algorithms for high-dimensional stochastic control problems based on deep learning and dynamic programming. Unlike classical approximate dynamic programming approaches, we first approximate the optimal policy by means of neural networks in the spirit of deep reinforcement learning, and then the value function by Monte Carlo regression. This is achieved in the dynamic programming recursion by performance or hybrid iteration, and regress now methods from numerical probabilities. We provide a theoretical justification of these algorithms. Consistency and rate of convergence for the control and value function estimates are analyzed and expressed in terms of the universal approximation error of the neural networks, and of the statistical error when estimating network function, leaving aside the optimization error. Numerical results on various applications are presented in a companion paper (arxiv.org/abs/1812.05916) and illustrate the performance of the proposed algorithms.



قيم البحث

اقرأ أيضاً

86 - Yan Yan , Yi Xu , Qihang Lin 2019
Previous studies on stochastic primal-dual algorithms for solving min-max problems with faster convergence heavily rely on the bilinear structure of the problem, which restricts their applicability to a narrowed range of problems. The main contributi on of this paper is the design and analysis of new stochastic primal-dual algorithms that use a mixture of stochastic gradient updates and a logarithmic number of deterministic dual updates for solving a family of convex-concave problems with no bilinear structure assumed. Faster convergence rates than $O(1/sqrt{T})$ with $T$ being the number of stochastic gradient updates are established under some mild conditions of involved functions on the primal and the dual variable. For example, for a family of problems that enjoy a weak strong convexity in terms of the primal variable and has a strongly concave function of the dual variable, the convergence rate of the proposed algorithm is $O(1/T)$. We also investigate the effectiveness of the proposed algorithms for learning robust models and empirical AUC maximization.
A stochastic model predictive control (SMPC) approach is presented for discrete-time linear systems with arbitrary time-invariant probabilistic uncertainties and additive Gaussian process noise. Closed-loop stability of the SMPC approach is establish ed by appropriate selection of the cost function. Polynomial chaos is used for uncertainty propagation through system dynamics. The performance of the SMPC approach is demonstrated using the Van de Vusse reactions.
In this paper, we consider the optimal stopping problem on semi-Markov processes (SMPs) with finite horizon, and aim to establish the existence and computation of optimal stopping times. To achieve the goal, we first develop the main results of finit e horizon semi-Markov decision processes (SMDPs) to the case with additional terminal costs, introduce an explicit construction of SMDPs, and prove the equivalence between the optimal stopping problems on SMPs and SMDPs. Then, using the equivalence and the results on SMDPs developed here, we not only show the existence of optimal stopping time of SMPs, but also provide an algorithm for computing optimal stopping time on SMPs. Moreover, we show that the optimal and -optimal stopping time can be characterized by the hitting time of some special sets, respectively.
We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) - which can also be viewed as doing matrix factorization using a particular regula rizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.
In this paper, we study the global convergence of majorization minimization (MM) algorithms for solving nonconvex regularized optimization problems. MM algorithms have received great attention in machine learning. However, when applied to nonconvex o ptimization problems, the convergence of MM algorithms is a challenging issue. We introduce theory of the Kurdyka- Lojasiewicz inequality to address this issue. In particular, we show that many nonconvex problems enjoy the Kurdyka- Lojasiewicz property and establish the global convergence result of the corresponding MM procedure. We also extend our result to a well known method that called CCCP (concave-convex procedure).
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا