No Arabic abstract
Morphological development is part of the way any human or animal learns. The learning processes starts with the morphology at birth and progresses through changing morphologies until adulthood is reached. Biologically, this seems to facilitate learning and make it more robust. However, when this approach is transferred to robotic systems, the results found in the literature are inconsistent: morphological development does not provide a learning advantage in every case. In fact, it can lead to poorer results than when learning with a fixed morphology. In this paper we analyze some of the issues involved by means of a simple, but very informative experiment in quadruped walking. From the results obtained an initial series of insights on when and under what conditions to apply morphological development for learning are presented.
Natural beings undergo a morphological development process of their bodies while they are learning and adapting to the environments they face from infancy to adulthood. In fact, this is the period where the most important learning pro-cesses, those that will support learning as adults, will take place. However, in artificial systems, this interaction between morphological development and learning, and its possible advantages, have seldom been considered. In this line, this paper seeks to provide some insights into how morphological development can be harnessed in order to facilitate learning in em-bodied systems facing tasks or domains that are hard to learn. In particular, here we will concentrate on whether morphological development can really provide any advantage when learning complex tasks and whether its relevance towards learning in-creases as tasks become harder. To this end, we present the results of some initial experiments on the application of morpho-logical development to learning to walk in three cases, that of a quadruped, a hexapod and that of an octopod. These results seem to confirm that as task learning difficulty increases the application of morphological development to learning becomes more advantageous.
We present an algorithm for rapidly learning controllers for robotics systems. The algorithm follows the model-based reinforcement learning paradigm, and improves upon existing algorithms; namely Probabilistic learning in Control (PILCO) and a sample-based version of PILCO with neural network dynamics (Deep-PILCO). We propose training a neural network dynamics model using variational dropout with truncated Log-Normal noise. This allows us to obtain a dynamics model with calibrated uncertainty, which can be used to simulate controller executions via rollouts. We also describe set of techniques, inspired by viewing PILCO as a recurrent neural network model, that are crucial to improve the convergence of the method. We test our method on a variety of benchmark tasks, demonstrating data-efficiency that is competitive with PILCO, while being able to optimize complex neural network controllers. Finally, we assess the performance of the algorithm for learning motor controllers for a six legged autonomous underwater vehicle. This demonstrates the potential of the algorithm for scaling up the dimensionality and dataset sizes, in more complex control tasks.
Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word big-data, we refer to this challenge as micro-data reinforcement learning. We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.
Mobile robot navigation is typically regarded as a geometric problem, in which the robots objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal. However, a purely geometric view of the world can can be insufficient for many navigation problems. For example, a robot navigating based on geometry may avoid a field of tall grass because it believes it is untraversable, and will therefore fail to reach its desired goal. In this work, we investigate how to move beyond these purely geometric-based approaches using a method that learns about physical navigational affordances from experience. Our approach, which we call BADGR, is an end-to-end learning-based mobile robot navigation system that can be trained with self-supervised off-policy data gathered in real-world environments, without any simulation or human supervision. BADGR can navigate in real-world urban and off-road environments with geometrically distracting obstacles. It can also incorporate terrain preferences, generalize to novel environments, and continue to improve autonomously by gathering more data. Videos, code, and other supplemental material are available on our website https://sites.google.com/view/badgr
Many robot control scenarios involve assessing system robustness against a task specification. If either the controller or environment are composed of black-box components with unknown dynamics, we cannot rely on formal verification to assess our system. Assessing robustness via exhaustive testing is also often infeasible if the space of environments is large compared to experiment cost. Given limited budget, we provide a method to choose experiment inputs which give greatest insight into system performance against a given specification across the domain. By combining smooth robustness metrics for signal temporal logic with techniques from adaptive experiment design, our method chooses the most informative experimental inputs by incrementally constructing a surrogate model of the specification robustness. This model then chooses the next experiment to be in an area where there is either high prediction error or uncertainty. Our experiments show how this adaptive experimental design technique results in sample-efficient descriptions of system robustness. Further, we show how to use the model built via the experiment design process to assess the behaviour of a data-driven control system under domain shift.