No Arabic abstract
Using deep reinforcement learning, we train control policies for autonomous vehicles leading a platoon of vehicles onto a roundabout. Using Flow, a library for deep reinforcement learning in micro-simulators, we train two policies, one policy with noise injected into the state and action space and one without any injected noise. In simulation, the autonomous vehicle learns an emergent metering behavior for both policies in which it slows to allow for smoother merging. We then directly transfer this policy without any tuning to the University of Delaware Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of both policies on the scaled city. We show that the noise-free policy winds up crashing and only occasionally metering. However, the noise-injected policy consistently performs the metering behavior and remains collision-free, suggesting that the noise helps with the zero-shot policy transfer. Additionally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the controllers can be found at https://sites.google.com/view/iccps-policy-transfer.
Although deep reinforcement learning (deep RL) methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, we propose to transfer concrete kinematic quantities in autonomous driving. The proposed robust-control-based (RC) generic transfer architecture, which we call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. We provide simulations validating the capability of the proposed method to achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.
This paper concerns applications of a recently-developed output-tracking technique to trajectory control of autonomous vehicles. The technique is based on three principles: Newton-Raphson flow for solving algebraic equations,output prediction, and controller speedup. Early applications of the technique, made to simple systems of an academic nature,were implemented by simple algorithms requiring modest computational efforts. In contrast, this paper tests it on commonly-used dynamic models to see if it can handle more complex control scenarios. Results are derived from simulations as well as a laboratory setting, and they indicate effective tracking convergence despite the simplicity of the control algorithm.
Optimal and Learning Control for Autonomous Robots has been taught in the Robotics, Systems and Controls Masters at ETH Zurich with the aim to teach optimal control and reinforcement learning for closed loop control problems from a unified point of view. The starting point is the formulation of of an optimal control problem and deriving the different types of solutions and algorithms from there. These lecture notes aim at supporting this unified view with a unified notation wherever possible, and a bit of a translation help to compare the terminology and notation in the different fields. The course assumes basic knowledge of Control Theory, Linear Algebra and Stochastic Calculus.
Over the recent years, there has been an explosion of studies on autonomous vehicles. Many collected large amount of data from human drivers. However, compared to the tedious data collection approach, building a virtual simulation of traffic makes the autonomous vehicle research more flexible, time-saving, and scalable. Our work features a 3D simulation that takes in real time position information parsed from street cameras. The simulation can easily switch between a global bird view of the traffic and a local perspective of a car. It can also filter out certain objects in its customized camera, creating various channels for objects of different categories. This provides alternative supervised or unsupervised ways to train deep neural networks. Another advantage of the 3D simulation is its conformation to physical laws. Its naturalness to accelerate and collide prepares the system for potential deep reinforcement learning needs.
Emergent cooperative adaptive cruise control (CACC) strategies being proposed in the literature for platoon formation in the Connected Autonomous Vehicle (CAV) context mostly assume idealized fixed information flow topologies (IFTs) for the platoon, implying guaranteed vehicle-to-vehicle (V2V) communications for the IFT assumed. Since CACC strategies entail continuous information broadcasting, communication failures can occur in congested CAV traffic networks, leading to a platoons IFT varying dynamically. To enhance the performance of CACC strategies, this study proposes the idea of dynamically optimizing the IFT for CACC, labeled the CACC-OIFT strategy. Under CACC-OIFT, the vehicles in the platoon cooperatively determine in real-time which vehicles will dynamically deactivate or activate the send functionality of their V2V communication devices to generate IFTs that optimize the platoon performance in terms of string stability under the ambient traffic conditions. Given the adaptive Proportional-Derivative (PD) controller with a two-predecessor-following scheme, and the ambient traffic conditions and the platoon size just before the start of a time period, the IFT optimization model determines the optimal IFT that maximizes the expected string stability. The optimal IFT is deployed for that time period, and the adaptive PD controller continuously determines the car-following behaviors of the vehicles based on the unfolding degeneration scenario for each time instant within that period. The effectiveness of the proposed CACC-OIFT is validated through numerical experiments in NS-3 based on NGSIM field data. The results indicate that the proposed CACC-OIFT can significantly enhance the string stability of platoon control in an unreliable V2V communication context, outperforming CACCs with fixed IFTs or with passive adaptive schemes for IFT dynamics.