Learning Model Predictive Control for Competitive Autonomous Racing

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lukas Brunke

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lukas Brunke

التعلم الآلي علم الروبوتات التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The goal of this thesis is to design a learning model predictive controller (LMPC) that allows multiple agents to race competitively on a predefined race track in real-time. This thesis addresses two major shortcomings in the already existing single-agent formulation. Previously, the agent determines a locally optimal trajectory but does not explore the state space, which may be necessary for overtaking maneuvers. Additionally, obstacle avoidance for LMPC has been achieved in the past by using a non-convex terminal set, which increases the complexity for determining a solution to the optimization problem. The proposed algorithm for multi-agent racing explores the state space by executing the LMPC for multiple different initializations, which yields a richer terminal safe set. Furthermore, a new method for selecting states in the terminal set is developed, which keeps the convexity for the terminal safe set and allows for taking suboptimal states.

قيم البحث

اقرأ أيضاً

Deep Value Model Predictive Control

103 - Farbod Farshidian , David Hoeller , Marco Hutter 2019

In this paper, we introduce an actor-critic algorithm called Deep Value Model Predictive Control (DMPC), which combines model-based trajectory optimization with value function estimation. The DMPC actor is a Model Predictive Control (MPC) optimizer w ith an objective function defined in terms of a value function estimated by the critic. We show that our MPC actor is an importance sampler, which minimizes an upper bound of the cross-entropy to the state distribution of the optimal sampling policy. In our experiments with a Ballbot system, we show that our algorithm can work with sparse and binary reward signals to efficiently solve obstacle avoidance and target reaching tasks. Compared to previous work, we show that including the value function in the running cost of the trajectory optimizer speeds up the convergence. We also discuss the necessary strategies to robustify the algorithm in practice.

التعلم الآلي علم الروبوتات التعلم الالي

Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars

112 - Axel Brunnbauer , Luigi Berducci , Andreas Brandstatter 2021

Despite the rich theoretical foundation of model-based deep reinforcement learning (RL) agents, their effectiveness in real-world robotics-applications is less studied and understood. In this paper, we, therefore, investigate how such agents generali ze to real-world autonomous-vehicle control-tasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with high-dimensional LiDAR sensors, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination, substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the observation-model choice. Finally, we provide extensive empirical evidence for the effectiveness of model-based agents provided with long enough memory horizons in sim2real tasks.

التعلم الآلي الذكاء الاصطناعي الحوسبة العصبية والتطورية

Minimum Time Learning Model Predictive Control

113 - Ugo Rosolia , Francesco Borrelli 2019

In this paper we present a Learning Model Predictive Control (LMPC) strategy for linear and nonlinear time optimal control problems. Our work builds on existing LMPC methodologies and it guarantees finite time convergence properties for the closed-lo op system. We show how to construct a time varying safe set and terminal cost function using closed-loop data. The resulting LMPC policy is time varying and it guarantees recursive constraint satisfaction and non-decreasing performance. Computational efficiency is obtained by convexifing the safe set and terminal cost function. We demonstrate that, for a class of nonlinear system and convex constraints, the convex LMPC formulation guarantees recursive constraint satisfaction and non-decreasing performance. Finally, we illustrate the effectiveness of the proposed strategies on minimum time obstacle avoidance and racing examples.

أنظمة وتحكم أنظمة وتحكم التحسين والتحكم

Iterative Semi-parametric Dynamics Model Learning For Autonomous Racing

77 - Ignat Georgiev , Christoforos Chatzikomis , Timo Volkl 2020

Accurately modeling robot dynamics is crucial to safe and efficient motion control. In this paper, we develop and apply an iterative learning semi-parametric model, with a neural network, to the task of autonomous racing with a Model Predictive Contr oller (MPC). We present a novel non-linear semi-parametric dynamics model where we represent the known dynamics with a parametric model, and a neural network captures the unknown dynamics. We show that our model can learn more accurately than a purely parametric model and generalize better than a purely non-parametric model, making it ideal for real-world applications where collecting data from the full state space is not feasible. We present a system where the model is bootstrapped on pre-recorded data and then updated iteratively at run time. Then we apply our iterative learning approach to the simulated problem of autonomous racing and show that it can safely adapt to modified dynamics online and even achieve better performance than models trained on data from manual driving.

علم الروبوتات التعلم الآلي أنظمة وتحكم

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

215 - Tianhao Zhang , Gregory Kahn , Sergey Levine 2015

Model predictive control (MPC) is an effective method for controlling robotic systems, particularly autonomous aerial vehicles such as quadcopters. However, application of MPC can be computationally demanding, and typically requires estimating the st ate of the system, which can be challenging in complex, unstructured environments. Reinforcement learning can in principle forego the need for explicit state estimation and acquire a policy that directly maps sensor readings to actions, but is difficult to apply to unstable systems that are liable to fail catastrophically during training before an effective policy has been found. We propose to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment. This data is used to train a deep neural network policy, which is allowed to access only the raw observations from the vehicles onboard sensors. After training, the neural network policy can successfully control the robot without knowledge of the full state, and at a fraction of the computational cost of MPC. We evaluate our method by learning obstacle avoidance policies for a simulated quadrotor, using simulated onboard sensors and no explicit state estimation at test time.

التعلم الآلي علم الروبوتات