Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

84 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Haitong Ma

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Haitong Ma - Jianyu Chen - Shengbo Eben Li

علم الروبوتات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.

قيم البحث

88 - Lukas Brunke , Melissa Greeff , Adam W. Hall 2021

The last half-decade has seen a steep rise in the number of contributions on safe learning methods for real-world robotic deployments from both the control and reinforcement learning communities. This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision making under uncertainties, with a focus on unifying the language and frameworks used in control theory and reinforcement learning research. Our review includes: learning-based control approaches that safely improve performance by learning the uncertain dynamics, reinforcement learning approaches that encourage safety or robustness, and methods that can formally certify the safety of a learned control policy. As data- and learning-based robot control methods continue to gain traction, researchers must understand when and how to best leverage them in real-world scenarios where safety is imperative, such as when operating in close proximity to humans. We highlight some of the open challenges that will drive the field of robot learning in the coming years, and emphasize the need for realistic physics-based benchmarks to facilitate fair comparisons between control and reinforcement learning approaches.

علم الروبوتات التعلم الآلي أنظمة وتحكم

Barrier Function-based Safe Reinforcement Learning for Emergency Control of Power Systems

130 - Thanh Long Vu , Sayak Mukherjee , Renke Huang 2021

Under voltage load shedding has been considered as a standard and effective measure to recover the voltage stability of the electric power grid under emergency and severe conditions. However, this scheme usually trips a massive amount of load which c an be unnecessary and harmful to customers. Recently, deep reinforcement learning (RL) has been regarded and adopted as a promising approach that can significantly reduce the amount of load shedding. However, like most existing machine learning (ML)-based control techniques, RL control usually cannot guarantee the safety of the systems under control. In this paper, we introduce a novel safe RL method for emergency load shedding of power systems, that can enhance the safe voltage recovery of the electric power grid after experiencing faults. Unlike the standard RL method, the safe RL method has a reward function consisting of a Barrier function that goes to minus infinity when the system state goes to the safety bounds. Consequently, the optimal control policy can render the power system to avoid the safety bounds. This method is general and can be applied to other safety-critical control problems. Numerical simulations on the 39-bus IEEE benchmark is performed to demonstrate the effectiveness of the proposed safe RL emergency control, as well as its adaptive capability to faults not seen in the training.

أنظمة وتحكم أنظمة وتحكم

safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning

93 - Zhaocong Yuan , Adam W. Hall , Siqi Zhou 2021

In recent years, reinforcement learning and learning-based control -- as well as the study of their safety, crucial for deployment in real-world robots -- have gained significant traction. However, to adequately gauge the progress and applicability o f new results, we need the tools to equitably compare the approaches proposed by the controls and reinforcement learning communities. Here, we propose a new open-source benchmark suite, called safe-control-gym. Our starting point is OpenAIs Gym API, which is one of the de facto standard in reinforcement learning research. Yet, we highlight the reasons for its limited appeal to control theory researchers -- and safe control, in particular. E.g., the lack of analytical models and constraint specifications. Thus, we propose to extend this API with (i) the ability to specify (and query) symbolic models and constraints and (ii) introduce simulated disturbances in the control inputs, measurements, and inertial properties. We provide implementations for three dynamic systems -- the cart-pole, 1D, and 2D quadrotor -- and two control tasks -- stabilization and trajectory tracking. To demonstrate our proposal -- and in an attempt to bring research communities closer together -- we show how to use safe-control-gym to quantitatively compare the control performance, data efficiency, and safety of multiple approaches from the areas of traditional control, learning-based control, and reinforcement learning.

علم الروبوتات التعلم الآلي أنظمة وتحكم

Model-Based Inverse Reinforcement Learning from Visual Demonstrations

186 - Neha Das , Sarah Bechtle , Todor Davchev 2020

Scaling model-based inverse reinforcement learning (IRL) to real robotic manipulation tasks with unknown dynamics remains an open problem. The key challenges lie in learning good dynamics models, developing algorithms that scale to high-dimensional s tate-spaces and being able to learn from both visual and proprioceptive demonstrations. In this work, we present a gradient-based inverse reinforcement learning framework that utilizes a pre-trained visual dynamics model to learn cost functions when given only visual human demonstrations. The learned cost functions are then used to reproduce the demonstrated behavior via visual model predictive control. We evaluate our framework on hardware on two basic object manipulation tasks.

علم الروبوتات التعلم الآلي

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

368 - Frederik Ebert , Chelsea Finn , Sudeep Dasari 2018

Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط