Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

215 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jingliang Duan

تاريخ النشر 2019

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Jingliang Duan - Zhengyu Liu - Shengbo Eben Li

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a constrained deep adaptive dynamic programming (CDADP) algorithm to solve general nonlinear optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Both the policy and value function are approximated by deep neural networks (NNs), which directly map the system state to action and value function respectively without needing to use hand-crafted basis function. The proposed algorithm considers the state constraints by transforming the policy improvement process to a constrained optimization problem. Meanwhile, a trust region constraint is added to prevent excessive policy update. We first linearize this constrained optimization problem locally into a quadratically-constrained quadratic programming problem, and then obtain the optimal update of policy network parameters by solving its dual problem. We also propose a series of recovery rules to update the policy in case the primal problem is infeasible. In addition, parallel learners are employed to explore different state spaces and then stabilize and accelerate the learning speed. The vehicle control problem in path-tracking task is used to demonstrate the effectiveness of this proposed method.

قيم البحث

314 - Yangguang Yu , Xiangke Wang , Zhiyong Sun 2020

In this paper, a new design scheme is presented to solve the optimal control problem for nonlinear systems with unsymmetrical input constraints. This method also relaxes the assumption in the current study for the adaptive optimal control, that is, t he internal dynamics should hold zero when the state of the system is in the origin. Particularity, the partially-unknown system is investigated and the procedure to obtain the corresponding optimal control policy is introduced. The optimality of the obtained control policy and the stability for the closed-loop dynamics are proved theoretically. Meanwhile, the proposed method in this paper can be further applied to nonlinear control systems whose dynamics are completely known or unknown. Besides, we apply the control design framework proposed in this paper to solve the optimal circumnavigation problem involving a moving target for a fixed-wing unmanned aerial vehicle (UAV). The control performance of our method is compared with that of the existing circumnavigation control law in a numerical simulation and the simulation results validate the effectiveness of our algorithm.

أنظمة وتحكم أنظمة وتحكم

Adaptive Smoothing Path Integral Control

80 - Dominik Thalmeier , Hilbert J. Kappen , Simone Totaro 2020

In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

Generalized Policy Iteration for Optimal Control in Continuous Time

152 - Jingliang Duan , Shengbo Eben Li , Zhengyu Liu 2019

This paper proposes the Deep Generalized Policy Iteration (DGPI) algorithm to find the infinite horizon optimal control policy for general nonlinear continuous-time systems with known dynamics. Unlike existing adaptive dynamic programming algorithms for continuous time systems, DGPI does not require the admissibility of initialized policy, and input-affine nature of controlled systems for convergence. Our algorithm employs the actor-critic architecture to approximate both policy and value functions with the purpose of iteratively solving the Hamilton-Jacobi-Bellman equation. Both the policy and value functions are approximated by deep neural networks. Given any arbitrary initial policy, the proposed DGPI algorithm can eventually converge to an admissible, and subsequently an optimal policy for an arbitrary nonlinear system. We also relax the update termination conditions of both the policy evaluation and improvement processes, which leads to a faster convergence speed than conventional Policy Iteration (PI) methods, for the same architecture of function approximators. We further prove the convergence and optimality of the algorithm with thorough Lyapunov analysis, and demonstrate its generality and efficacy using two detailed numerical examples.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

Sparse Bayesian Deep Learning for Dynamic System Identification

210 - Hongpeng Zhou , Chahine Ibrahim , Wei Xing Zheng 2021

This paper proposes a sparse Bayesian treatment of deep neural networks (DNNs) for system identification. Although DNNs show impressive approximation ability in various fields, several challenges still exist for system identification problems. First, DNNs are known to be too complex that they can easily overfit the training data. Second, the selection of the input regressors for system identification is nontrivial. Third, uncertainty quantification of the model parameters and predictions are necessary. The proposed Bayesian approach offers a principled way to alleviate the above challenges by marginal likelihood/model evidence approximation and structured group sparsity-inducing priors construction. The identification algorithm is derived as an iterative regularized optimization procedure that can be solved as efficiently as training typical DNNs. Furthermore, a practical calculation approach based on the Monte-Carlo integration method is derived to quantify the uncertainty of the parameters and predictions. The effectiveness of the proposed Bayesian approach is demonstrated on several linear and nonlinear systems identification benchmarks with achieving good and competitive simulation accuracy.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

Control Barrier Functions for Unknown Nonlinear Systems using Gaussian Processes

91 - Pushpak Jagtap , George J. Pappas , Majid Zamani 2020

This paper focuses on the controller synthesis for unknown, nonlinear systems while ensuring safety constraints. Our approach consists of two steps, a learning step that uses Gaussian processes and a controller synthesis step that is based on control barrier functions. In the learning step, we use a data-driven approach utilizing Gaussian processes to learn the unknown control affine nonlinear dynamics together with a statistical bound on the accuracy of the learned model. In the second controller synthesis steps, we develop a systematic approach to compute control barrier functions that explicitly take into consideration the uncertainty of the learned model. The control barrier function not only results in a safe controller by construction but also provides a rigorous lower bound on the probability of satisfaction of the safety specification. Finally, we illustrate the effectiveness of the proposed results by synthesizing a safety controller for a jet engine example.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم