No Arabic abstract
Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.
Control barrier functions have shown great success in addressing control problems with safety guarantees. These methods usually find the next safe control input by solving an online quadratic programming problem. However, model uncertainty is a big challenge in synthesizing controllers. This may lead to the generation of unsafe control actions, resulting in severe consequences. In this paper, we develop a learning framework to deal with system uncertainty. Our method mainly focuses on learning the dynamics of the control barrier function, especially for high relative degree with respect to a system. We show that for each order, the time derivative of the control barrier function can be separated into the time derivative of the nominal control barrier function and a remainder. This implies that we can use a neural network to learn the remainder so that we can approximate the dynamics of the real control barrier function. We show by simulation that our method can generate safe trajectories under parametric uncertainty using a differential drive robot model.
Multi-Agent Reinforcement Learning (MARL) algorithms show amazing performance in simulation in recent years, but placing MARL in real-world applications may suffer safety problems. MARL with centralized shields was proposed and verified in safety games recently. However, centralized shielding approaches can be infeasible in several real-world multi-agent applications that involve non-cooperative agents or communication delay. Thus, we propose to combine MARL with decentralized Control Barrier Function (CBF) shields based on available local information. We establish a safe MARL framework with decentralized multiple CBFs and develop Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to Multi-Agent Deep Deterministic Policy Gradient with decentralized multiple Control Barrier Functions (MADDPG-CBF). Based on a collision-avoidance problem that includes not only cooperative agents but obstacles, we demonstrate the construction of multiple CBFs with safety guarantees in theory. Experiments are conducted and experiment results verify that the proposed safe MARL framework can guarantee the safety of agents included in MARL.
Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications, because ensuring safe exploration or facilitating adequate exploitation is a challenges for controlling robotic systems with unknown models and measurement uncertainties. Such a learning problem becomes even more intractable for complex tasks over continuous space (state-space and action-space). In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. In addition, we utilize the properties of LTL automatons and ECBFs to construct a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. And we show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability confidence during training.
In this paper, we propose a notion of high-order (zeroing) barrier functions that generalizes the concept of zeroing barrier functions and guarantees set forward invariance by checking their higher order derivatives. The proposed formulation guarantees asymptotic stability of the forward invariant set, which is highly favorable for robustness with respect to model perturbations. No forward completeness assumption is needed in our setting in contrast to existing high order barrier function methods. For the case of controlled dynamical systems, we relax the requirement of uniform relative degree and propose a singularity-free control scheme that yields a locally Lipschitz control signal and guarantees safety. Furthermore, the proposed formulation accounts for performance-critical control: it guarantees that a subset of the forward invariant set will admit any existing, bounded control law, while still ensuring forward invariance of the set. Finally, a non-trivial case study with rigid-body attitude dynamics and interconnected cell regions as the safe region is investigated.
This paper combines episodic learning and control barrier functions in the setting of bipedal locomotion. The safety guarantees that control barrier functions provide are only valid with perfect model knowledge; however, this assumption cannot be met on hardware platforms. To address this, we utilize the notion of projection-to-state safety paired with a machine learning framework in an attempt to learn the model uncertainty as it affects the barrier functions. The proposed approach is demonstrated both in simulation and on hardware for the AMBER-3M bipedal robot in the context of the stepping-stone problem, which requires precise foot placement while walking dynamically.