End-to-End Vision-Based Adaptive Cruise Control (ACC) Using Deep Reinforcement Learning

476 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ziran Wang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhensong Wei - Yu Jiang - Xishun Liao

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presented a deep reinforcement learning method named Double Deep Q-networks to design an end-to-end vision-based adaptive cruise control (ACC) system. A simulation environment of a highway scene was set up in Unity, which is a game engine that provided both physical models of vehicles and feature data for training and testing. Well-designed reward functions associated with the following distance and throttle/brake force were implemented in the reinforcement learning model for both internal combustion engine (ICE) vehicles and electric vehicles (EV) to perform adaptive cruise control. The gap statistics and total energy consumption are evaluated for different vehicle types to explore the relationship between reward functions and powertrain characteristics. Compared with the traditional radar-based ACC systems or human-in-the-loop simulation, the proposed vision-based ACC system can generate either a better gap regulated trajectory or a smoother speed trajectory depending on the preset reward function. The proposed system can be well adaptive to different speed trajectories of the preceding vehicle and operated in real-time.

قيم البحث

94 - Sumeet Batra , Zhehui Huang , Aleksei Petrenko 2021

We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable o f controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing stationkeeping and goal swapping behaviors. Code and video demonstrations are available at the project website https://sites.google.com/view/swarm-rl.

علم الروبوتات

End-to-End Robotic Reinforcement Learning without Reward Engineering

117 - Avi Singh , Larry Yang , Kristian Hartikainen 2019

The combination of deep neural network models and reinforcement learning algorithms can make it possible to learn policies for robotic behaviors that directly read in raw sensory inputs, such as camera images, effectively subsuming both estimation an d control into one model. However, real-world applications of reinforcement learning must specify the goal of the task by means of a manually programmed reward function, which in practice requires either designing the very same perception pipeline that end-to-end reinforcement learning promises to avoid, or else instrumenting the environment with additional sensors to determine if the task has been performed successfully. In this paper, we propose an approach for removing the need for manual engineering of reward specifications by enabling a robot to learn from a modest number of examples of successful outcomes, followed by actively solicited queries, where the robot shows the user a state and asks for a label to determine whether that state represents successful completion of the task. While requesting labels for every single state would amount to asking the user to manually provide the reward signal, our method requires labels for only a tiny fraction of the states seen during training, making it an efficient and practical approach for learning skills without manually engineered rewards. We evaluate our method on real-world robotic manipulation tasks where the observations consist of images viewed by the robots camera. In our experiments, our method effectively learns to arrange objects, place books, and drape cloth, directly from images and without any manually specified reward functions, and with only 1-4 hours of interaction with the real world.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

End-to-end representation learning for Correlation Filter based tracking

242 - Jack Valmadre , Luca Bertinetto , Jo~ao F. Henriques 2017

The Correlation Filter is an algorithm that trains a linear template to discriminate between images and their translations. It is well suited to object tracking because its formulation in the Fourier domain provides a fast solution, enabling the dete ctor to be re-trained once per frame. Previous works that use the Correlation Filter, however, have adopted features that were either manually designed or trained for a different task. This work is the first to overcome this limitation by interpreting the Correlation Filter learner, which has a closed-form solution, as a differentiable layer in a deep neural network. This enables learning deep features that are tightly coupled to the Correlation Filter. Experiments illustrate that our method has the important practical benefit of allowing lightweight architectures to achieve state-of-the-art performance at high framerates.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

546 - Ruihan Yang , Minghao Zhang , Nicklas Hansen 2021

We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made g reat advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method for quadrupedal locomotion that leverages a Transformer-based model for fusing proprioceptive states and visual observations. We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We show that our method obtains significant improvements over policies with only proprioceptive state inputs, and that Transformer-based models further improve generalization across environments. Our project page with videos is at https://RchalYang.github.io/LocoTransformer .

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Physics-augmented models to simulate commercial adaptive cruise control (ACC) systems

86 - Yinglong He , Marcello Montanino , Konstantinos Mattas 2021

This paper investigates the accuracy and robustness of car-following (CF) and adaptive cruise control (ACC) models used to simulate measured driving behaviour of commercial ACCs. To this aim, a general modelling framework is proposed, in which ACC an d CF models have been incrementally augmented with physics extensions; namely, perception delay, linear or nonlinear vehicle dynamics, and acceleration constraints. The framework has been applied to the Intelligent Driver Model (IDM), the Gipps model, and to three basic ACCs. These are a linear controller coupled with a constant time-headway spacing policy and with two other policies derived from the traffic flow theory, which are the IDM desired-distance function and the Gipps equilibrium distance-speed function. The ninety models resulting from the combination of the five base models and the aforementioned physics extensions, have been assessed and compared through a vast calibration and validation experiment against measured trajectory data of low-level automated vehicles. When a single extension has been applied, perception delay and linear dynamics have been the extensions to mostly increase modelling accuracy, whatsoever the base model considered. Concerning models, Gipps-based ones have outperformed all other CF and ACC models in calibration. Even among ACCs, the linear controllers coupled with a Gipps spacing policy have been the best performing. On the other hand, IDM-based models have been by far the most robust in validation, showing almost no crash when calibrated parameters have been used to simulate different trajectories. Overall, the paper shows the importance of cross-fertilization between traffic flow and vehicle studies.

أنظمة وتحكم أنظمة وتحكم