Learning Controller Gains on Bipedal Walking Robots via User Preferences

109 0 0.0 ( 0 )

Download Cite

Added by Noel Csomay-Shanklin

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Noel Csomay-Shanklin - Maegan Tucker - Min Dai

Robotics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Experimental demonstration of complex robotic behaviors relies heavily on finding the correct controller gains. This painstaking process is often completed by a domain expert, requiring deep knowledge of the relationship between parameter values and the resulting behavior of the system. Even when such knowledge is possessed, it can take significant effort to navigate the nonintuitive landscape of possible parameter combinations. In this work, we explore the extent to which preference-based learning can be used to optimize controller gains online by repeatedly querying the user for their preferences. This general methodology is applied to two variants of control Lyapunov function based nonlinear controllers framed as quadratic programs, which have nice theoretic properties but are challenging to realize in practice. These controllers are successfully demonstrated both on the planar underactuated biped, AMBER, and on the 3D underactuated biped, Cassie. We experimentally evaluate the performance of the learned controllers and show that the proposed method is repeatably able to learn gains that yield stable and robust locomotion.

rate research

Preference-Based Learning for User-Guided HZD Gait Generation on Bipedal Walking Robots

182 - Maegan Tucker , Noel Csomay-Shanklin , Wen-Loong Ma 2020

This paper presents a framework that leverages both control theory and machine learning to obtain stable and robust bipedal locomotion without the need for manual parameter tuning. Traditionally, gaits are generated through trajectory optimization methods and then realized experimentally -- a process that often requires extensive tuning due to differences between the models and hardware. In this work, the process of gait realization via hybrid zero dynamics (HZD) based optimization is formally combined with preference-based learning to systematically realize dynamically stable walking. Importantly, this learning approach does not require a carefully constructed reward function, but instead utilizes human pairwise preferences. The power of the proposed approach is demonstrated through two experiments on a planar biped AMBER-3M: the first with rigid point-feet, and the second with induced model uncertainty through the addition of springs where the added compliance was not accounted for in the gait generation or in the controller. In both experiments, the framework achieves stable, robust, efficient, and natural walking in fewer than 50 iterations with no reliance on a simulation environment. These results demonstrate a promising step in the unification of control theory and learning.

Robotics Machine Learning

Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

118 - Zhongyu Li , Xuxin Cheng , Xue Bin Peng 2021

Developing robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.

Robotics Artificial Intelligence Machine Learning

ReQuBiS -- Reconfigurable Quadrupedal-Bipedal Snake Robots

107 - Harshad Zade , Aadesh Varude , Karan Pandya 2021

The selection of mobility modes for robot navigation consists of various trade-offs. Snake robots are ideal for traversing through constrained environments such as pipes, cluttered and rough terrain, whereas bipedal robots are more suited for structured environments such as stairs. Finally, quadruped robots are more stable than bipeds and can carry larger payloads than snakes and bipeds but struggle to navigate soft soil, sand, ice, and constrained environments. A reconfigurable robot can achieve the best of all worlds. Unfortunately, state-of-the-art reconfigurable robots rely on the rearrangement of modules through complicated mechanisms to dissemble and assemble at different places, increasing the size, weight, and power (SWaP) requirements. We propose Reconfigurable Quadrupedal-Bipedal Snake Robots (ReQuBiS), which can transform between mobility modes without rearranging modules. Hence, requiring just a single modification mechanism. Furthermore, our design allows the robot to split into two agents to perform tasks in parallel for biped and snake mobility. Experimental results demonstrate these mobility capabilities in snake, quadruped, and biped modes and transitions between them.

Robotics

Capture Steps: Robust Walking for Humanoid Robots

145 - Marcell Missura , Maren Bennewitz , Sven Behnke 2020

Stable bipedal walking is a key prerequisite for humanoid robots to reach their potential of being versatile helpers in our everyday environments. Bipedal walking is, however, a complex motion that requires the coordination of many degrees of freedom while it is also inherently unstable and sensitive to disturbances. The balance of a walking biped has to be constantly maintained. The most effective way of controlling balance are well timed and placed recovery steps -- capture steps -- that absorb the expense momentum gained from a push or a stumble. We present a bipedal gait generation framework that utilizes step timing and foot placement techniques in order to recover the balance of a biped even after strong disturbances. Our framework modifies the next footstep location instantly when responding to a disturbance and generates controllable omnidirectional walking using only very little sensing and computational power. We exploit the open-loop stability of a central pattern generated gait to fit a linear inverted pendulum model to the observed center of mass trajectory. Then, we use the fitted model to predict suitable footstep locations and timings in order to maintain balance while following a target walking velocity. Our experiments show qualitative and statistical evidence of one of the strongest push-recovery capabilities among humanoid robots to date.

Robotics

Invariant Filtering for Bipedal Walking on Dynamic Rigid Surfaces with Orientation-based Measurement Model

249 - Yuan Gao , Yan Gu 2021

Real-world applications of bipedal robot walking require accurate, real-time state estimation. State estimation for locomotion over dynamic rigid surfaces (DRS), such as elevators, ships, public transport vehicles, and aircraft, remains under-explored, although state estimator designs for stationary rigid surfaces have been extensively studied. Addressing DRS locomotion in state estimation is a challenging problem mainly due to the nonlinear, hybrid nature of walking dynamics, the nonstationary surface-foot contact points, and hardware imperfections (e.g., limited availability, noise, and drift of onboard sensors). Towards solving this problem, we introduce an Invariant Extended Kalman Filter (InEKF) whose process and measurement models explicitly consider the DRS movement and hybrid walking behaviors while respectively satisfying the group-affine condition and invariant form. Due to these attractive properties, the estimation error convergence of the filter is provably guaranteed for hybrid DRS locomotion. The measurement model of the filter also exploits the holonomic constraint associated with the support-foot and surface orientations, under which the robots yaw angle in the world becomes observable in the presence of general DRS movement. Experimental results of bipedal walking on a rocking treadmill demonstrate the proposed filter ensures the rapid error convergence and observable base yaw angle.

Robotics