No Arabic abstract
Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data.
Robot gait optimization is the task of generating an optimal control trajectory under various internal and external constraints. Given the high dimensions of control space, this problem is particularly challenging for multi-legged robots walking in complex and unknown environments. Existing literatures often regard the gait generation as an optimization problem and solve the gait optimization from scratch for robots walking in a specific environment. However, such approaches do not consider the use of pre-acquired knowledge which can be useful in improving the quality and speed of motion generation in complex environments. To address the issue, this paper proposes a transfer learning-based evolutionary framework for multi-objective gait optimization, named Tr-GO. The idea is to initialize a high-quality population by using the technique of transfer learning, so any kind of population-based optimization algorithms can be seamlessly integrated into this framework. The advantage is that the generated gait can not only dynamically adapt to different environments and tasks, but also simultaneously satisfy multiple design specifications (e.g., speed, stability). The experimental results show the effectiveness of the proposed framework for the gait optimization problem based on three multi-objective evolutionary algorithms: NSGA-II, RM-MEDA and MOPSO. When transferring the pre-acquired knowledge from the plain terrain to various inclined and rugged ones, the proposed Tr-GO framework accelerates the evolution process by a minimum of 3-4 times compared with non-transferred scenarios.
While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task boundaries, and focus on alleviating catastrophic forgetting. In this work, we depart from this view and move the focus towards faster remembering -- i.e measuring how quickly the network recovers performance rather than measuring the networks performance without any adaptation. We argue that in many settings this can be more effective and that it opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages. We propose a framework specific for the scenario where no information about task boundaries or task identity is given. It relies on a separation of concerns into what task is being solved and how the task should be solved. This framework is implemented by differentiating task specific parameters from task agnostic parameters, where the latter are optimized in a continual meta learning fashion, without access to multiple tasks at the same time. We showcase this framework in a supervised learning scenario and discuss the implication of the proposed formalism.
Recent work has shown results on learning navigation policies for idealized cylinder agents in simulation and transferring them to real wheeled robots. Deploying such navigation policies on legged robots can be challenging due to their complex dynamics, and the large dynamical difference between cylinder agents and legged systems. In this work, we learn hierarchical navigation policies that account for the low-level dynamics of legged robots, such as maximum speed, slipping, contacts, and learn to successfully navigate cluttered indoor environments. To enable transfer of policies learned in simulation to new legged robots and hardware, we learn dynamics-aware navigation policies across multiple robots with robot-specific embeddings. The learned embedding is optimized on new robots, while the rest of the policy is kept fixed, allowing for quick adaptation. We train our policies across three legged robots in simulation - 2 quadrupeds (A1, AlienGo) and a hexapod (Daisy). At test time, we study the performance of our learned policy on two new legged robots in simulation (Laikago, 4-legged Daisy), and one real-world quadrupedal robot (A1). Our experiments show that our learned policy can sample-efficiently generalize to previously unseen robots, and enable sim-to-real transfer of navigation policies for legged robots.
We introduce a robust control architecture for the whole-body motion control of torque controlled robots with arms and legs. The method is based on the robust control of contact forces in order to track a planned Center of Mass trajectory. Its appeal lies in the ability to guarantee robust stability and performance despite rigid body model mismatch, actuator dynamics, delays, contact surface stiffness, and unobserved ground profiles. Furthermore, we introduce a task space decomposition approach which removes the coupling effects between contact force controller and the other non-contact controllers. Finally, we verify our control performance on a quadruped robot and compare its performance to a standard inverse dynamics approach on hardware.
In this paper we present a new approach for dynamic motion planning for legged robots. We formulate a trajectory optimization problem based on a compact form of the robot dynamics. Such a form is obtained by projecting the rigid body dynamics onto the null space of the Constraint Jacobian. As consequence of the projection, contact forces are removed from the model but their effects are still taken into account. This approach permits to solve the optimal control problem of a floating base constrained multibody system while avoiding the use of an explicit contact model. We use direct transcription to numerically solve the optimization. As the contact forces are not part of the decision variables the size of the resultant discrete mathematical program is reduced and therefore solutions can be obtained in a tractable time. Using a predefined sequence of contact configurations (phases), our approach solves motions where contact switches occur. Transitions between phases are automatically resolved without using a model for switching dynamics. We present results on a hydraulic quadruped robot (HyQ), including single phase (standing, crouching) as well as multiple phase (rearing, diagonal leg balancing and stepping) dynamic motions.