No Arabic abstract
The goal of this thesis is to design a learning model predictive controller (LMPC) that allows multiple agents to race competitively on a predefined race track in real-time. This thesis addresses two major shortcomings in the already existing single-agent formulation. Previously, the agent determines a locally optimal trajectory but does not explore the state space, which may be necessary for overtaking maneuvers. Additionally, obstacle avoidance for LMPC has been achieved in the past by using a non-convex terminal set, which increases the complexity for determining a solution to the optimization problem. The proposed algorithm for multi-agent racing explores the state space by executing the LMPC for multiple different initializations, which yields a richer terminal safe set. Furthermore, a new method for selecting states in the terminal set is developed, which keeps the convexity for the terminal safe set and allows for taking suboptimal states.
In this paper, we introduce an actor-critic algorithm called Deep Value Model Predictive Control (DMPC), which combines model-based trajectory optimization with value function estimation. The DMPC actor is a Model Predictive Control (MPC) optimizer with an objective function defined in terms of a value function estimated by the critic. We show that our MPC actor is an importance sampler, which minimizes an upper bound of the cross-entropy to the state distribution of the optimal sampling policy. In our experiments with a Ballbot system, we show that our algorithm can work with sparse and binary reward signals to efficiently solve obstacle avoidance and target reaching tasks. Compared to previous work, we show that including the value function in the running cost of the trajectory optimizer speeds up the convergence. We also discuss the necessary strategies to robustify the algorithm in practice.
Despite the rich theoretical foundation of model-based deep reinforcement learning (RL) agents, their effectiveness in real-world robotics-applications is less studied and understood. In this paper, we, therefore, investigate how such agents generalize to real-world autonomous-vehicle control-tasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with high-dimensional LiDAR sensors, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination, substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the observation-model choice. Finally, we provide extensive empirical evidence for the effectiveness of model-based agents provided with long enough memory horizons in sim2real tasks.
In this paper we present a Learning Model Predictive Control (LMPC) strategy for linear and nonlinear time optimal control problems. Our work builds on existing LMPC methodologies and it guarantees finite time convergence properties for the closed-loop system. We show how to construct a time varying safe set and terminal cost function using closed-loop data. The resulting LMPC policy is time varying and it guarantees recursive constraint satisfaction and non-decreasing performance. Computational efficiency is obtained by convexifing the safe set and terminal cost function. We demonstrate that, for a class of nonlinear system and convex constraints, the convex LMPC formulation guarantees recursive constraint satisfaction and non-decreasing performance. Finally, we illustrate the effectiveness of the proposed strategies on minimum time obstacle avoidance and racing examples.
Accurately modeling robot dynamics is crucial to safe and efficient motion control. In this paper, we develop and apply an iterative learning semi-parametric model, with a neural network, to the task of autonomous racing with a Model Predictive Controller (MPC). We present a novel non-linear semi-parametric dynamics model where we represent the known dynamics with a parametric model, and a neural network captures the unknown dynamics. We show that our model can learn more accurately than a purely parametric model and generalize better than a purely non-parametric model, making it ideal for real-world applications where collecting data from the full state space is not feasible. We present a system where the model is bootstrapped on pre-recorded data and then updated iteratively at run time. Then we apply our iterative learning approach to the simulated problem of autonomous racing and show that it can safely adapt to modified dynamics online and even achieve better performance than models trained on data from manual driving.
Model predictive control (MPC) is an effective method for controlling robotic systems, particularly autonomous aerial vehicles such as quadcopters. However, application of MPC can be computationally demanding, and typically requires estimating the state of the system, which can be challenging in complex, unstructured environments. Reinforcement learning can in principle forego the need for explicit state estimation and acquire a policy that directly maps sensor readings to actions, but is difficult to apply to unstable systems that are liable to fail catastrophically during training before an effective policy has been found. We propose to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment. This data is used to train a deep neural network policy, which is allowed to access only the raw observations from the vehicles onboard sensors. After training, the neural network policy can successfully control the robot without knowledge of the full state, and at a fraction of the computational cost of MPC. We evaluate our method by learning obstacle avoidance policies for a simulated quadrotor, using simulated onboard sensors and no explicit state estimation at test time.