Barrier-Certified Adaptive Reinforcement Learning with Applications to Brushbot Navigation

77 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Motoya Ohnishi

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Motoya Ohnishi - Li Wang - Gennaro Notomista

التعلم الآلي علم الروبوتات أنظمة وتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a safe learning framework that employs an adaptive model learning algorithm together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique. We use the learned model in combination with control barrier certificates which constrain policies (feedback controllers) in order to maintain safety, which refers to avoiding particular undesirable regions of the state space. Under certain conditions, recovery of safety in the sense of Lyapunov stability after violations of safety due to the nonstationarity is guaranteed. In addition, we reformulate an action-value function approximation to make any kernel-based nonlinear function estimation method applicable to our adaptive learning framework. Lastly, solutions to the barrier-certified policy optimization are guaranteed to be globally optimal, ensuring the greedy policy improvement under mild conditions. The resulting framework is validated via simulations of a quadrotor, which has previously been used under stationarity assumptions in the safe learnings literature, and is then tested on a real robot, the brushbot, whose dynamics is unknown, highly complex and nonstationary.

قيم البحث

85 - Minghao Han , Yuan Tian , Lixian Zhang 2019

Reinforcement learning is showing great potentials in robotics applications, including autonomous driving, robot manipulation and locomotion. However, with complex uncertainties in the real-world environment, it is difficult to guarantee the successf ul generalization and sim-to-real transfer of learned policies theoretically. In this paper, we introduce and extend the idea of robust stability and $H_infty$ control to design policies with both stability and robustness guarantee. Specifically, a sample-based approach for analyzing the Lyapunov stability and performance robustness of a learning-based control system is proposed. Based on the theoretical results, a maximum entropy algorithm is developed for searching Lyapunov function and designing a policy with provable robust stability guarantee. Without any specific domain knowledge, our method can find a policy that is robust to various uncertainties and generalizes well to different test environments. In our experiments, we show that our method achieves better robustness to both large impulsive disturbances and parametric variations in the environment than the state-of-art results in both robust and generic RL, as well as classic control. Anonymous code is available to reproduce the experimental results at https://github.com/RobustStabilityGuaranteeRL/RobustStabilityGuaranteeRL.

التعلم الآلي علم الروبوتات أنظمة وتحكم

Safe Reinforcement Learning Using Advantage-Based Intervention

90 - Nolan Wagener , Byron Boots , Ching-An Cheng 2021

Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe p olicy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agents policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.

التعلم الآلي علم الروبوتات أنظمة وتحكم

Learning-based Adaptive Control using Contraction Theory

298 - Hiroyasu Tsukamoto , Soon-Jo Chung , Jean-Jacques Slotine 2021

We present a deep learning-based adaptive control framework for nonlinear systems with multiplicatively separable parametrization, called aNCM - for adaptive Neural Contraction Metric. The framework utilizes a deep neural network to approximate a sta bilizing adaptive control law parameterized by an optimal contraction metric. The use of deep networks permits real-time implementation of the control law and broad applicability to a variety of systems, including systems modeled with basis function approximation methods. We show using contraction theory that aNCM ensures exponential boundedness of the distance between the target and controlled trajectories even under the presence of the parametric uncertainty, robustly to the learning errors caused by aNCM approximation as well as external additive disturbances. Its superiority to the existing robust and adaptive control methods is demonstrated in a simple cart-pole balancing task.

التعلم الآلي علم الروبوتات أنظمة وتحكم

Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

156 - Brian Gaudet , Isaac Charcos , Roberto Furfaro 2021

We apply the meta reinforcement learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile, implementing the system as a deep neural network (the policy). The policy maps observations direct ly to commanded rates of change for the missiles control surface deflections, with the observations derived with minimal processing from the computationally stabilized line of sight unit vector measured by a strap down seeker, estimated rotational velocity from rate gyros, and control surface deflection angles. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model. Through extensive simulation, we demonstrate that the system can adapt to a large flight envelope and off nominal flight conditions that include perturbation of aerodynamic coefficient parameters and center of pressure locations. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction, imperfect seeker stabilization, and sensor scale factor errors. Finally, we compare our systems performance to two benchmarks: a proportional navigation guidance system benchmark in a simplified 3-DOF environment, which we take as an upper bound on performance attainable with separate guidance and flight control systems, and a longitudinal model of proportional navigation coupled with a three loop autopilot. We find that our system moderately outperforms the former, and outperforms the latter by a large margin.

أنظمة وتحكم علم الروبوتات أنظمة وتحكم

Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States

116 - Peter Wolf , Karl Kurzer , Tobias Wingert 2018

Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge. In this work we use Deep Reinforceme nt Learning to learn maneuver decisions based on a compact semantic state representation. This ensures a consistent model of the environment across scenarios as well as a behavior adaptation function, enabling on-line changes of desired behaviors without re-training. The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors, superimposed by a relational semantic scene description. The state as well as the reward are extended by a behavior adaptation function and a parameterization respectively. With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere to traffic rules and learns to drive safely in a variety of situations.

التعلم الآلي علم الروبوتات التعلم الالي