GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model

101 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhaoming Xie

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhaoming Xie - Xingye Da - Buck Babich

علم الروبوتات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Model-free reinforcement learning (RL) for legged locomotion commonly relies on a physics simulator that can accurately predict the behaviors of every degree of freedom of the robot. In contrast, approximate reduced-order models are often sufficient for many model-based control strategies. In this work we explore how RL can be effectively used with a centroidal model to generate robust control policies for quadrupedal locomotion. Advantages over RL with a full-order model include a simple reward structure, reduced computational costs, and robust sim-to-real transfer. We further show the potential of the method by demonstrating stepping-stone locomotion, two-legged in-place balance, balance beam locomotion, and sim-to-real transfer without further adaptations. Additional Results: https://www.pair.toronto.edu/glide-quadruped/.

قيم البحث

101 - Siddhant Gangapurwala , Alexander Mitchell , Ioannis Havoutis 2020

Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from chal lenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.

علم الروبوتات التعلم الآلي أنظمة وتحكم

Fast Online Planning for Bipedal Locomotion via Centroidal Model Predictive Gait Synthesis

103 - Yijie Guo , Mingwei Zhang , Hao Dong 2021

The planning of whole-body motion and step time for bipedal locomotion is constructed as a model predictive control (MPC) problem, in which a sequence of optimization problems needs to be solved online. While directly solving these problems is extrem ely time-consuming, we propose a predictive gait synthesizer to offer immediate solutions. Based on the full-dimensional model, a library of gaits with different speeds and periods is first constructed offline. Then the proposed gait synthesizer generates real-time gaits at 1kHz by synthesizing the gait library based on the online prediction of centroidal dynamics. We prove that the constructed MPC problem can ensure the uniform ultimate boundedness (UUB) of the CoM states and show that our proposed gait synthesizer can provide feasible solutions to the MPC optimization problems. Simulation and experimental results on a bipedal robot with 8 degrees of freedom (DoF) are provided to show the performance and robustness of this approach.

علم الروبوتات

Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion

100 - Haojie Shi , Bo Zhou , Hongsheng Zeng 2021

Recently reinforcement learning (RL) has emerged as a promising approach for quadrupedal locomotion, which can save the manual effort in conventional approaches such as designing skill-specific controllers. However, due to the complex nonlinear dynam ics in quadrupedal robots and reward sparsity, it is still difficult for RL to learn effective gaits from scratch, especially in challenging tasks such as walking over the balance beam. To alleviate such difficulty, we propose a novel RL-based approach that contains an evolutionary foot trajectory generator. Unlike prior methods that use a fixed trajectory generator, the generator continually optimizes the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. The policy is trained with reinforcement learning to output residual control signals that fit different gaits. We then optimize the trajectory generator and policy network alternatively to stabilize the training and share the exploratory data to improve sample efficiency. As a result, our approach can solve a range of challenging tasks in simulation by learning from scratch, including walking on a balance beam and crawling through the cave. To further verify the effectiveness of our approach, we deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits.

علم الروبوتات الذكاء الاصطناعي

Contact-Implicit Optimization of Locomotion Trajectories for a Quadrupedal Microrobot

88 - Neel Doshi , Kaushik Jayaram , Benjamin Goldberg 2019

Planning locomotion trajectories for legged microrobots is challenging because of their complex morphology, high frequency passive dynamics, and discontinuous contact interactions with their environment. Consequently, such research is often driven by time-consuming experimental methods. As an alternative, we present a framework for systematically modeling, planning, and controlling legged microrobots. We develop a three-dimensional dynamic model of a 1.5 gram quadrupedal microrobot with complexity (e.g., number of degrees of freedom) similar to larger-scale legged robots. We then adapt a recently developed variational contact-implicit trajectory optimization method to generate feasible whole-body locomotion plans for this microrobot, and we demonstrate that these plans can be tracked with simple joint-space controllers. We plan and execute periodic gaits at multiple stride frequencies and on various surfaces. These gaits achieve high per-cycle velocities, including a maximum of 10.87 mm/cycle, which is 15% faster than previously measured velocities for this microrobot. Furthermore, we plan and execute a vertical jump of 9.96 mm, which is 78% of the microrobots center-of-mass height. To the best of our knowledge, this is the first end-to-end demonstration of planning and tracking whole-body dynamic locomotion on a millimeter-scale legged microrobot.

علم الروبوتات

Dynamics Randomization Revisited:A Case Study for Quadrupedal Locomotion

130 - Zhaoming Xie , Xingye Da , Michiel van de Panne 2020

Understanding the gap between simulation and reality is critical for reinforcement learning with legged robots, which are largely trained in simulation. However, recent work has resulted in sometimes conflicting conclusions with regard to which facto rs are important for success, including the role of dynamics randomization. In this paper, we aim to provide clarity and understanding on the role of dynamics randomization in learning robust locomotion policies for the Laikago quadruped robot. Surprisingly, in contrast to prior work with the same robot model, we find that direct sim-to-real transfer is possible without dynamics randomization or on-robot adaptation schemes. We conduct extensive ablation studies in a sim-to-sim setting to understand the key issues underlying successful policy transfer, including other design decisions that can impact policy robustness. We further ground our conclusions via sim-to-real experiments with various gaits, speeds, and stepping frequencies. Additional Details: https://www.pair.toronto.edu/understanding-dr/.

علم الروبوتات