ترغب بنشر مسار تعليمي؟ اضغط هنا

Track based Offline Policy Learning for Overtaking Maneuvers with Autonomous Racecars

109   0   0.0 ( 0 )
 نشر من قبل Johannes Betz Dr.
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The rising popularity of driver-less cars has led to the research and development in the field of autonomous racing, and overtaking in autonomous racing is a challenging task. Vehicles have to detect and operate at the limits of dynamic handling and decisions in the car have to be made at high speeds and high acceleration. One of the most crucial parts in autonomous racing is path planning and decision making for an overtaking maneuver with a dynamic opponent vehicle. In this paper we present the evaluation of a track based offline policy learning approach for autonomous racing. We define specific track portions and conduct offline experiments to evaluate the probability of an overtaking maneuver based on speed and position of the ego vehicle. Based on these experiments we can define overtaking probability distributions for each of the track portions. Further, we propose a switching MPCC controller setup for incorporating the learnt policies to achieve a higher rate of overtaking maneuvers. By exhaustive simulations, we show that our proposed algorithm is able to increase the number of overtakes at different track portions.

قيم البحث

اقرأ أيضاً

This paper proposes a novel framework for addressing the challenge of autonomous overtaking and obstacle avoidance, which incorporates the overtaking path planning into Gaussian Process-based model predictive control (GPMPC). Compared with the conven tional control strategies, this approach has two main advantages. Firstly, combining Gaussian Process (GP) regression with a nominal model allows for learning from model mismatch and unmodeled dynamics, which enhances a simple model and delivers significantly better results. Due to the approximation for propagating uncertainties, we can furthermore satisfy the constraints and thereby safety of the vehicle is ensured. Secondly, we convert the geometric relationship between the ego vehicle and other obstacle vehicles into the constraints. Without relying on a higherlevel path planner, this approach substantially reduces the computational burden. In addition, we transform the state constraints under the model predictive control (MPC) framework into a soft constraint and incorporate it as relaxed barrier function into the cost function, which makes the optimizer more efficient. Simulation results reveal the usefulness of the proposed approach.
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy , and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.
This paper proposes a life-long adaptive path tracking policy learning method for autonomous vehicles that can self-evolve and self-adapt with multi-task knowledge. Firstly, the proposed method can learn a model-free control policy for path tracking directly from the historical driving experience, where the property of vehicle dynamics and corresponding control strategy can be learned simultaneously. Secondly, by utilizing the life-long learning method, the proposed method can learn the policy with task-incremental knowledge without encountering catastrophic forgetting. Thus, with continual multi-task knowledge learned, the policy can iteratively adapt to new tasks and improve its performance with knowledge from new tasks. Thirdly, a memory evaluation and updating method is applied to optimize memory structure for life-long learning which enables the policy to learn toward selected directions. Experiments are conducted using a high-fidelity vehicle dynamic model in a complex curvy road to evaluate the performance of the proposed method. Results show that the proposed method can effectively evolve with continual multi-task knowledge and adapt to the new environment, where the performance of the proposed method can also surpass two commonly used baseline methods after evolving.
We propose an imitation learning system for autonomous driving in urban traffic with interactions. We train a Behavioral Cloning~(BC) policy to imitate driving behavior collected from the real urban traffic, and apply the data aggregation algorithm t o improve its performance iteratively. Applying data aggregation in this setting comes with two challenges. The first challenge is that it is expensive and dangerous to collect online rollout data in the real urban traffic. Creating similar traffic scenarios in simulator like CARLA for online rollout collection can also be difficult. Instead, we propose to create a weak simulator from the training dataset, in which all the surrounding vehicles follow the data trajectory provided by the dataset. We find that the collected online data in such a simulator can still be used to improve BC policys performance. The second challenge is the tedious and time-consuming process of human labelling process during online rollout. To solve this problem, we use an A$^*$ planner as a pseudo-expert to provide expert-like demonstration. We validate our proposed imitation learning system in the real urban traffic scenarios. The experimental results show that our system can significantly improve the performance of baseline BC policy.
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any costly or dang erous active exploration. However, it is also challenging, due to the distributional shift between the offline training data and those states visited by the learned policy. Despite significant recent progress, the most successful prior methods are model-free and constrain the policy to the support of data, precluding generalization to unseen states. In this paper, we first observe that an existing model-based RL algorithm already produces significant gains in the offline setting compared to model-free approaches. However, standard model-based RL methods, designed for the online setting, do not provide an explicit mechanism to avoid the offline settings distributional shift issue. Instead, we propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics. We theoretically show that the algorithm maximizes a lower bound of the policys return under the true MDP. We also characterize the trade-off between the gain and risk of leaving the support of the batch data. Our algorithm, Model-based Offline Policy Optimization (MOPO), outperforms standard model-based RL algorithms and prior state-of-the-art model-free offline RL algorithms on existing offline RL benchmarks and two challenging continuous control tasks that require generalizing from data collected for a different task. The code is available at https://github.com/tianheyu927/mopo.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا