No Arabic abstract
Designing robots by hand can be costly and time consuming, especially if the robots have to be created with novel materials, or be robust to internal or external changes. In order to create robots automatically, without the need for human intervention, it is necessary to optimise both the behaviour and the body design of the robot. However, when co-optimising the morphology and controller of a locomoting agent the morphology tends to converge prematurely, reaching a local optimum. Approaches such as explicit protection of morphological innovation have been used to reduce this problem, but it might also be possible to increase exploration of morphologies using a more indirect approach. We explore how changing the environment, where the agent locomotes, affects the convergence of morphologies. The agents morphologies and controllers are co-optimised, while the environments the agents locomote in are evolved open-endedly with the Paired Open-Ended Trailblazer (POET). We compare the diversity, fitness and robustness of agents evolving in environments generated by POET to agents evolved in handcrafted curricula of environments. Our agents each contain of a population of individuals being evolved with a genetic algorithm. This population is called the agent-population. We show that agent-populations evolving in open-endedly evolving environments exhibit larger morphological diversity than agent-populations evolving in hand crafted curricula of environments. POET proved capable of creating a curriculum of environments which encouraged both diversity and quality in the populations. This suggests that POET may be capable of reducing premature convergence in co-optimisation of morphology and controllers.
Open-ended learning is a core research field of machine learning and robotics aiming to build learning machines and robots able to autonomously acquire knowledge and skills and to reuse them to solve novel tasks. The multiple challenges posed by open-ended learning have been operationalized in the robotic competition REAL 2020. This requires a simulated camera-arm-gripper robot to (a) autonomously learn to interact with objects during an intrinsic phase where it can learn how to move objects and then (b) during an extrinsic phase, to re-use the acquired knowledge to accomplish externally given goals requiring the robot to move objects to specific locations unknown during the intrinsic phase. Here we present a baseline architecture for solving the challenge, provided as baseline model for REAL 2020. Few models have all the functionalities needed to solve the REAL 2020 benchmark and none has been tested with it yet. The architecture we propose is formed by three components: (1) Abstractor: abstracting sensory input to learn relevant control variables from images; (2) Explorer: generating experience to learn goals and actions; (3) Planner: formulating and executing action plans to accomplish the externally provided goals. The architecture represents the first model to solve the simpler REAL 2020 Round 1 allowing the use of a simple parameterised push action. On Round 2, the architecture was used with a more general action (sequence of joints positions) achieving again higher than chance level performance. The baseline software is well documented and available for download and use at https://github.com/AIcrowd/REAL2020_starter_kit.
Natural evolution has produced a tremendous diversity of functional organisms. Many believe an essential component of this process was the evolution of evolvability, whereby evolution speeds up its ability to innovate by generating a more adaptive pool of offspring. One hypothesized mechanism for evolvability is developmental canalization, wherein certain dimensions of variation become more likely to be traversed and others are prevented from being explored (e.g. offspring tend to have similarly sized legs, and mutations affect the length of both legs, not each leg individually). While ubiquitous in nature, canalization almost never evolves in computational simulations of evolution. Not only does that deprive us of in silico models in which to study the evolution of evolvability, but it also raises the question of which conditions give rise to this form of evolvability. Answering this question would shed light on why such evolvability emerged naturally and could accelerate engineering efforts to harness evolution to solve important engineering challenges. In this paper we reveal a unique system in which canalization did emerge in computational evolution. We document that genomes entrench certain dimensions of variation that were frequently explored during their evolutionary history. The genetic representation of these organisms also evolved to be highly modular and hierarchical, and we show that these organizational properties correlate with increased fitness. Interestingly, the type of computational evolutionary experiment that produced this evolvability was very different from traditional digital evolution in that there was no objective, suggesting that open-ended, divergent evolutionary processes may be necessary for the evolution of evolvability.
Animals ranging from rats to humans can demonstrate cognitive map capabilities. We evolved weights in a biologically plausible recurrent neural network (RNN) using an evolutionary algorithm to replicate the behavior and neural activity observed in rats during a spatial and working memory task in a triple T-maze. The rat was simulated in the Webots robot simulator and used vision, distance and accelerometer sensors to navigate a virtual maze. After evolving weights from sensory inputs to the RNN, within the RNN, and from the RNN to the robots motors, the Webots agent successfully navigated the space to reach all four reward arms with minimal repeats before time-out. Our current findings suggest that it is the RNN dynamics that are key to performance, and that performance is not dependent on any one sensory type, which suggests that neurons in the RNN are performing mixed selectivity and conjunctive coding. Moreover, the RNN activity resembles spatial information and trajectory-dependent coding observed in the hippocampus. Collectively, the evolved RNN exhibits navigation skills, spatial memory, and working memory. Our method demonstrates how the dynamic activity in evolved RNNs can capture interesting and complex cognitive behavior and may be used to create RNN controllers for robotic applications.
Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. Together, these four advances enable the most open-ended algorithmic demonstration to date. The algorithmic innovations are (1) a domain-general measure of how meaningfully novel new challenges are, enabling the system to potentially create and solve interesting challenges endlessly, and (2) an efficient heuristic for determining when agents should goal-switch from one problem to another (helping open-ended search better scale). Outside the algorithm itself, to enable a more definitive demonstration of open-endedness, we introduce (3) a novel, more flexible way to encode environmental challenges, and (4) a generic measure of the extent to which a system continues to exhibit open-ended innovation. Enhanced POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved through other means.
In swarm robotics, any of the robots in a swarm may be affected by different faults, resulting in significant performance declines. To allow fault recovery from randomly injected faults to different robots in a swarm, a model-free approach may be preferable due to the accumulation of faults in models and the difficulty to predict the behaviour of neighbouring robots. One model-free approach to fault recovery involves two phases: during simulation, a quality-diversity algorithm evolves a behaviourally diverse archive of controllers; during the target application, a search for the best controller is initiated after fault injection. In quality-diversity algorithms, the choice of the behavioural descriptor is a key design choice that determines the quality of the evolved archives, and therefore the fault recovery performance. Although the environment is an important determinant of behaviour, the impact of environmental diversity is often ignored in the choice of a suitable behavioural descriptor. This study compares different behavioural descriptors, including two generic descriptors that work on a wide range of tasks, one hand-coded descriptor which fits the domain of interest, and one novel type of descriptor based on environmental diversity, which we call Quality-Environment-Diversity (QED). Results demonstrate that the above-mentioned model-free approach to fault recovery is feasible in the context of swarm robotics, reducing the fault impact by a factor 2-3. Further, the environmental diversity obtained with QED yields a unique behavioural diversity profile that allows it to recover from high-impact faults.