No Arabic abstract
Understanding the gap between simulation and reality is critical for reinforcement learning with legged robots, which are largely trained in simulation. However, recent work has resulted in sometimes conflicting conclusions with regard to which factors are important for success, including the role of dynamics randomization. In this paper, we aim to provide clarity and understanding on the role of dynamics randomization in learning robust locomotion policies for the Laikago quadruped robot. Surprisingly, in contrast to prior work with the same robot model, we find that direct sim-to-real transfer is possible without dynamics randomization or on-robot adaptation schemes. We conduct extensive ablation studies in a sim-to-sim setting to understand the key issues underlying successful policy transfer, including other design decisions that can impact policy robustness. We further ground our conclusions via sim-to-real experiments with various gaits, speeds, and stepping frequencies. Additional Details: https://www.pair.toronto.edu/understanding-dr/.
Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
Planning locomotion trajectories for legged microrobots is challenging because of their complex morphology, high frequency passive dynamics, and discontinuous contact interactions with their environment. Consequently, such research is often driven by time-consuming experimental methods. As an alternative, we present a framework for systematically modeling, planning, and controlling legged microrobots. We develop a three-dimensional dynamic model of a 1.5 gram quadrupedal microrobot with complexity (e.g., number of degrees of freedom) similar to larger-scale legged robots. We then adapt a recently developed variational contact-implicit trajectory optimization method to generate feasible whole-body locomotion plans for this microrobot, and we demonstrate that these plans can be tracked with simple joint-space controllers. We plan and execute periodic gaits at multiple stride frequencies and on various surfaces. These gaits achieve high per-cycle velocities, including a maximum of 10.87 mm/cycle, which is 15% faster than previously measured velocities for this microrobot. Furthermore, we plan and execute a vertical jump of 9.96 mm, which is 78% of the microrobots center-of-mass height. To the best of our knowledge, this is the first end-to-end demonstration of planning and tracking whole-body dynamic locomotion on a millimeter-scale legged microrobot.
In this paper, we aim to improve the robustness of dynamic quadrupedal locomotion through two aspects: 1) fast model predictive foothold planning, and 2) applying LQR to projected inverse dynamic control for robust motion tracking. In our proposed planning and control framework, foothold plans are updated at 400 Hz considering the current robot state and an LQR controller generates optimal feedback gains for motion tracking. The LQR optimal gain matrix with non-zero off-diagonal elements leverages the coupling of dynamics to compensate for system underactuation. Meanwhile, the projected inverse dynamic control complements the LQR to satisfy inequality constraints. In addition to these contributions, we show robustness of our control framework to unmodeled adaptive feet. Experiments on the quadruped ANYmal demonstrate the effectiveness of the proposed method for robust dynamic locomotion given external disturbances and environmental uncertainties.
We present a legged motion planning approach for quadrupedal locomotion over challenging terrain. We decompose the problem into body action planning and footstep planning. We use a lattice representation together with a set of defined body movement primitives for computing a body action plan. The lattice representation allows us to plan versatile movements that ensure feasibility for every possible plan. To this end, we propose a set of rules that define the footstep search regions and footstep sequence given a body action. We use Anytime Repairing A* (ARA*) search that guarantees bounded suboptimal plans. Our main contribution is a planning approach that generates on-line versatile movements. Experimental trials demonstrate the performance of our planning approach in a set of challenging terrain conditions. The terrain information and plans are computed on-line and on-board.
Continuous robot operation in extreme scenarios such as underground mines or sewers is difficult because exteroceptive sensors may fail due to fog, darkness, dirt or malfunction. So as to enable autonomous navigation in these kinds of situations, we have developed a type of proprioceptive localization which exploits the foot contacts made by a quadruped robot to localize against a prior map of an environment, without the help of any camera or LIDAR sensor. The proposed method enables the robot to accurately re-localize itself after making a sequence of contact events over a terrain feature. The method is based on Sequential Monte Carlo and can support both 2.5D and 3D prior map representations. We have tested the approach online and onboard the ANYmal quadruped robot in two different scenarios: the traversal of a custom built wooden terrain course and a wall probing and following task. In both scenarios, the robot is able to effectively achieve a localization match and to execute a desired pre-planned path. The method keeps the localization error down to 10cm on feature rich terrain by only using its feet, kinematic and inertial sensing.