No Arabic abstract
This paper presents a safe reinforcement learning system for automated driving that benefits from multimodal future trajectory predictions. We propose a safety system that consists of two safety components: a heuristic safety and a learning-based safety. The heuristic safety module is based on common driving rules. On the other hand, the learning-based safety module is a data-driven safety rule that learns safety patterns from driving data. Specifically, it utilizes mixture density recurrent neural networks (MD-RNN) for multimodal future trajectory predictions to accelerate the learning progress. Our simulation results demonstrate that the proposed safety system outperforms previously reported results in terms of average reward and number of collisions.
In this paper, we present a safe deep reinforcement learning system for automated driving. The proposed framework leverages merits of both rule-based and learning-based approaches for safety assurance. Our safety system consists of two modules namely handcrafted safety and dynamically-learned safety. The handcrafted safety module is a heuristic safety rule based on common driving practice that ensure a minimum relative gap to a traffic vehicle. On the other hand, the dynamically-learned safety module is a data-driven safety rule that learns safety patterns from driving data. Specifically, the dynamically-leaned safety module incorporates a model lookahead beyond the immediate reward of reinforcement learning to predict safety longer into the future. If one of the future states leads to a near-miss or collision, then a negative reward will be assigned to the reward function to avoid collision and accelerate the learning process. We demonstrate the capability of the proposed framework in a simulation environment with varying traffic density. Our results show the superior capabilities of the policy enhanced with dynamically-learned safety module.
We present an integrated approach for perception and control for an autonomous vehicle and demonstrate this approach in a high-fidelity urban driving simulator. Our approach first builds a model for the environment, then trains a policy exploiting the learned model to identify the action to take at each time-step. To build a model for the environment, we leverage several deep learning algorithms. To that end, first we train a variational autoencoder to encode the input image into an abstract latent representation. We then utilize a recurrent neural network to predict the latent representation of the next frame and handle temporal information. Finally, we utilize an evolutionary-based reinforcement learning algorithm to train a controller based on these latent representations to identify the action to take. We evaluate our approach in CARLA, a high-fidelity urban driving simulator, and conduct an extensive generalization study. Our results demonstrate that our approach outperforms several previously reported approaches in terms of the percentage of successfully completed episodes for a lane keeping task.
Under voltage load shedding has been considered as a standard and effective measure to recover the voltage stability of the electric power grid under emergency and severe conditions. However, this scheme usually trips a massive amount of load which can be unnecessary and harmful to customers. Recently, deep reinforcement learning (RL) has been regarded and adopted as a promising approach that can significantly reduce the amount of load shedding. However, like most existing machine learning (ML)-based control techniques, RL control usually cannot guarantee the safety of the systems under control. In this paper, we introduce a novel safe RL method for emergency load shedding of power systems, that can enhance the safe voltage recovery of the electric power grid after experiencing faults. Unlike the standard RL method, the safe RL method has a reward function consisting of a Barrier function that goes to minus infinity when the system state goes to the safety bounds. Consequently, the optimal control policy can render the power system to avoid the safety bounds. This method is general and can be applied to other safety-critical control problems. Numerical simulations on the 39-bus IEEE benchmark is performed to demonstrate the effectiveness of the proposed safe RL emergency control, as well as its adaptive capability to faults not seen in the training.
Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process. For implementing safe reinforcement learning on highly nonlinear and high-dimensional dynamical systems, one possible approach is to find a low-dimensional safe region via data-driven feature extraction methods, which provides safety estimates to the learning algorithm. As the reliability of the learned safety estimates is data-dependent, we investigate in this work how different training data will affect the safe reinforcement learning approach. By balancing between the learning performance and the risk of being unsafe, a data generation method that combines two sampling methods is proposed to generate representative training data. The performance of the method is demonstrated with a three-link inverted pendulum example.
This paper describes a verification case study on an autonomous racing car with a neural network (NN) controller. Although several verification approaches have been proposed over the last year, they have only been evaluated on low-dimensional systems or systems with constrained environments. To explore the limits of existing approaches, we present a challenging benchmark in which the NN takes raw LiDAR measurements as input and outputs steering for the car. We train a dozen NNs using two reinforcement learning algorithms and show that the state of the art in verification can handle systems with around 40 LiDAR rays, well short of a typical LiDAR scan with 1081 rays. Furthermore, we perform real experiments to investigate the benefits and limitations of verification with respect to the sim2real gap, i.e., the difference between a systems modeled and real performance. We identify cases, similar to the modeled environment, in which verification is strongly correlated with safe behavior. Finally, we illustrate LiDAR fault patterns that can be used to develop robust and safe reinforcement learning algorithms.