No Arabic abstract
Learning-based control aims to construct models of a system to use for planning or trajectory optimization, e.g. in model-based reinforcement learning. In order to obtain guarantees of safety in this context, uncertainty must be accurately quantified. This uncertainty may come from errors in learning (due to a lack of data, for example), or may be inherent to the system. Propagating uncertainty forward in learned dynamics models is a difficult problem. In this work we use deep learning to obtain expressive and flexible models of how distributions of trajectories behave, which we then use for nonlinear Model Predictive Control (MPC). We introduce a deep quantile regression framework for control that enforces probabilistic quantile bounds and quantifies epistemic uncertainty. Using our method we explore three different approaches for learning tubes that contain the possible trajectories of the system, and demonstrate how to use each of them in a Tube MPC scheme. We prove these schemes are recursively feasible and satisfy constraints with a desired margin of probability. We present experiments in simulation on a nonlinear quadrotor system, demonstrating the practical efficacy of these ideas.
Humans can routinely follow a trajectory defined by a list of images/landmarks. However, traditional robot navigation methods require accurate mapping of the environment, localization, and planning. Moreover, these methods are sensitive to subtle changes in the environment. In this paper, we propose a Deep Visual MPC-policy learning method that can perform visual navigation while avoiding collisions with unseen objects on the navigation path. Our model PoliNet takes in as input a visual trajectory and the image of the robots current view and outputs velocity commands for a planning horizon of $N$ steps that optimally balance between trajectory following and obstacle avoidance. PoliNet is trained using a strong image predictive model and traversability estimation model in a MPC setup, with minimal human supervision. Different from prior work, PoliNet can be applied to new scenes without retraining. We show experimentally that the robot can follow a visual trajectory when varying start position and in the presence of previously unseen obstacles. We validated our algorithm with tests both in a realistic simulation environment and in the real world. We also show that we can generate visual trajectories in simulation and execute the corresponding path in the real environment. Our approach outperforms classical approaches as well as previous learning-based baselines in success rate of goal reaching, sub-goal coverage rate, and computational load.
In this work, we consider the problem of deriving and incorporating accurate dynamic models for model predictive control (MPC) with an application to quadrotor control. MPC relies on precise dynamic models to achieve the desired closed-loop performance. However, the presence of uncertainties in complex systems and the environments they operate in poses a challenge in obtaining sufficiently accurate representations of the system dynamics. In this work, we make use of a deep learning tool, knowledge-based neural ordinary differential equations (KNODE), to augment a model obtained from first principles. The resulting hybrid model encompasses both a nominal first-principle model and a neural network learnt from simulated or real-world experimental data. Using a quadrotor, we benchmark our hybrid model against a state-of-the-art Gaussian Process (GP) model and show that the hybrid model provides more accurate predictions of the quadrotor dynamics and is able to generalize beyond the training data. To improve closed-loop performance, the hybrid model is integrated into a novel MPC framework, known as KNODE-MPC. Results show that the integrated framework achieves 73% improvement in simulations and more than 14% in physical experiments, in terms of trajectory tracking performance.
Model Predictive Control (MPC) has shown the great performance of target optimization and constraint satisfaction. However, the heavy computation of the Optimal Control Problem (OCP) at each triggering instant brings the serious delay from state sampling to the control signals, which limits the applications of MPC in resource-limited robot manipulator systems over complicated tasks. In this paper, we propose a novel robust tube-based smooth-MPC strategy for nonlinear robot manipulator planning systems with disturbances and constraints. Based on piecewise linearization and state prediction, our control strategy improves the smoothness and optimizes the delay of the control process. By deducing the deviation of the real system states and the nominal system states, we can predict the next real state set at the current instant. And by using this state set as the initial condition, we can solve the next OCP ahead and store the optimal controls based on the nominal system states, which eliminates the delay. Furthermore, we linearize the nonlinear system with a given upper bound of error, reducing the complexity of the OCP and improving the response speed. Based on the theoretical framework of tube MPC, we prove that the control strategy is recursively feasible and closed-loop stable with the constraints and disturbances. Numerical simulations have verified the efficacy of the designed approach compared with the conventional MPC.
Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the experts policies. In this paper, we present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework to provide upper bounds on the expected cost of policies in novel environments. We propose a two-stage training method where a latent policy distribution is first embedded with multi-modal expert behavior using a conditional variational autoencoder, and then fine-tuned in new training environments to explicitly optimize the generalization bound. We demonstrate strong generalization bounds and their tightness relative to empirical performance in simulation for (i) grasping diverse mugs, (ii) planar pushing with visual feedback, and (iii) vision-based indoor navigation, as well as through hardware experiments for the two manipulation tasks.
This paper presents a new approach to deal with the dual problem of system identification and regulation. The main feature consists of breaking the control input to the system into a regulator part and a persistently exciting part. The former is used to regulate the plant using a robust MPC formulation, in which the latter is treated as a bounded additive disturbance. The identification process is executed by a simple recursive least squares algorithm. In order to guarantee sufficient excitation for the identification, an additional non-convex constraint is enforced over the persistently exciting part.