ترغب بنشر مسار تعليمي؟ اضغط هنا

Imitation Learning with Stability and Safety Guarantees

80   0   0.0 ( 0 )
 نشر من قبل He Yin
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

A method is presented to learn neural network (NN) controllers with stability and safety guarantees through imitation learning (IL). Convex stability and safety conditions are derived for linear time-invariant plant dynamics with NN controllers by merging Lyapunov theory with local quadratic constraints to bound the nonlinear activation functions in the NN. These conditions are incorporated in the IL process, which minimizes the IL loss, and maximizes the volume of the region of attraction associated with the NN controller simultaneously. An alternating direction method of multipliers based algorithm is proposed to solve the IL problem. The method is illustrated on an inverted pendulum system, aircraft longitudinal dynamics, and vehicle lateral dynamics.



قيم البحث

اقرأ أيضاً

162 - Yixuan Wang , Chao Huang , Qi Zhu 2020
Neural networks have been increasingly applied for control in learning-enabled cyber-physical systems (LE-CPSs) and demonstrated great promises in improving system performance and efficiency, as well as reducing the need for complex physical models. However, the lack of safety guarantees for such neural network based controllers has significantly impeded their adoption in safety-critical CPSs. In this work, we propose a controller adaptation approach that automatically switches among multiple controllers, including neural network controllers, to guarantee system safety and improve energy efficiency. Our approach includes two key components based on formal methods and machine learning. First, we approximate each controller with a Bernstein-polynomial based hybrid system model under bounded disturbance, and compute a safe invariant set for each controller based on its corresponding hybrid system. Intuitively, the invariant set of a controller defines the state space where the system can always remain safe under its control. The union of the controllers invariants sets then define a safe adaptation space that is larger than (or equal to) that of each controller. Second, we develop a deep reinforcement learning method to learn a controller switching strategy for reducing the control/actuation energy cost, while with the help of a safety guard rule, ensuring that the system stays within the safe space. Experiments on a linear adaptive cruise control system and a non-linear Van der Pols oscillator demonstrate the effectiveness of our approach on energy saving and safety enhancement.
Control schemes for autonomous systems are often designed in a way that anticipates the worst case in any situation. At runtime, however, there could exist opportunities to leverage the characteristics of specific environment and operation context fo r more efficient control. In this work, we develop an online intermittent-control framework that combines formal verification with model-based optimization and deep reinforcement learning to opportunistically skip certain control computation and actuation to save actuation energy and computational resources without compromising system safety. Experiments on an adaptive cruise control system demonstrate that our approach can achieve significant energy and computation savings.
We propose Kernel Predictive Control (KPC), a learning-based predictive control strategy that enjoys deterministic guarantees of safety. Noise-corrupted samples of the unknown system dynamics are used to learn several models through the formalism of non-parametric kernel regression. By treating each prediction step individually, we dispense with the need of propagating sets through highly non-linear maps, a procedure that often involves multiple conservative approximation steps. Finite-sample error bounds are then used to enforce state-feasibility by employing an efficient robust formulation. We then present a relaxation strategy that exploits on-line data to weaken the optimization problem constraints while preserving safety. Two numerical examples are provided to illustrate the applicability of the proposed control method.
152 - Jun Liu , Yiming Meng , Yinan Li 2020
Stability and safety are two important aspects in safety-critical control of dynamical systems. It has been a well established fact in control theory that stability properties can be characterized by Lyapunov functions. Reachability properties can al so be naturally captured by Lyapunov functions for finite-time stability. Motivated by safety-critical control applications, such as in autonomous systems and robotics, there has been a recent surge of interests in characterizing safety properties using barrier functions. Lyapunov and barrier functions conditions, however, are sometimes viewed as competing objectives. In this paper, we provide a unified theoretical treatment of Lyapunov and barrier functions in terms of converse theorems for stability properties with safety guarantees and reach-avoid-stay type specifications. We show that if a system (modeled as a perturbed dynamical system) possesses a stability with safety property, then there exists a smooth Lyapunov function to certify such a property. This Lyapunov function is shown to be defined on the entire set of initial conditions from which solutions satisfy this property. A similar but slightly weaker statement is made for reach-avoid-stay specifications. We show by a simple example that the latter statement cannot be strengthened without additional assumptions.
We develop a control algorithm that ensures the safety, in terms of confinement in a set, of a system with unknown, 2nd-order nonlinear dynamics. The algorithm establishes novel connections between data-driven and robust, nonlinear control. It is bas ed on data obtained online from the current trajectory and the concept of reciprocal barriers. More specifically, it first uses the obtained data to calculate set-valued functions that over-approximate the unknown dynamic terms. For the second step of the algorithm, we design a robust control scheme that uses these functions as well as reciprocal barriers to render the system forward invariant with respect to the safe set. In addition, we provide an extension of the algorithm that tackles issues of controllability loss incurred by the nullspace of the control-direction matrix. The algorithm removes a series of standard, limiting assumptions considered in the related literature since it does not require global boundedness, growth conditions, or a priori approximations of the unknown dynamics terms.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا