Model-Based Quality-Diversity Search for Efficient Robot Learning

87 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Tanneberg

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Leon Keller - Daniel Tanneberg - Svenja Stark

الذكاء الاصطناعي التعلم الآلي الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Despite recent progress in robot learning, it still remains a challenge to program a robot to deal with open-ended object manipulation tasks. One approach that was recently used to autonomously generate a repertoire of diverse skills is a novelty based Quality-Diversity~(QD) algorithm. However, as most evolutionary algorithms, QD suffers from sample-inefficiency and, thus, it is challenging to apply it in real-world scenarios. This paper tackles this problem by integrating a neural network that predicts the behavior of the perturbed parameters into a novelty based QD algorithm. In the proposed Model-based Quality-Diversity search (M-QD), the network is trained concurrently to the repertoire and is used to avoid executing unpromising actions in the novelty search process. Furthermore, it is used to adapt the skills of the final repertoire in order to generalize the skills to different scenarios. Our experiments show that enhancing a QD algorithm with such a forward model improves the sample-efficiency and performance of the evolutionary process and the skill adaptation.

قيم البحث

اقرأ أيضاً

Few-shot Quality-Diversity Optimisation

106 - Achkan Salehi , Alexandre Coninx , Stephane Doncieux 2021

In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimisation. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimisation in parameter space can be leveraged to build a prior population, which when used to initialise QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimisation in these environments.

التعلم الآلي الذكاء الاصطناعي الحوسبة العصبية والتطورية

Safe Interactive Model-Based Learning

75 - Marco Gallieri , Seyed Sina Mirrazavi Salehian , Nihat Engin Toklun 2019

Control applications present hard operational constraints. A violation of these can result in unsafe behavior. This paper introduces Safe Interactive Model Based Learning (SiMBL), a framework to refine an existing controller and a system model while operating on the real environment. SiMBL is composed of the following trainable components: a Lyapunov function, which determines a safe set; a safe control policy; and a Bayesian RNN forward model. A min-max control framework, based on alternate minimisation and backpropagation through the forward model, is used for the offline computation of the controller and the safe set. Safety is formally verified a-posteriori with a probabilistic method that utilizes the Noise Contrastive Priors (NPC) idea to build a Bayesian RNN forward model with an additive state uncertainty estimate which is large outside the training data distribution. Iterative refinement of the model and the safe set is achieved thanks to a novel loss that conditions the uncertainty estimates of the new model to be close to the current one. The learned safe set and model can also be used for safe exploration, i.e., to collect data within the safe invariant set, for which a simple one-step MPC is proposed. The single components are tested on the simulation of an inverted pendulum with limited torque and stability region, showing that iteratively adding more data can improve the model, the controller and the size of the safe region.

أنظمة وتحكم التعلم الآلي الحوسبة العصبية والتطورية

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

93 - Mingde Zhao , Zhen Liu , Sitao Luan 2021

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state, in order to plan and to generalize better out-of-distribution. The agents architecture uses a set representation and a b ottleneck mechanism, forcing the number of entities to which the agent attends at each planning step to be small. In experiments with customized MiniGrid environments with different dynamics, we observe that the design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.

الذكاء الاصطناعي التعلم الآلي

Maximizing Store Revenues using Tabu Search for Floor Space Optimization

82 - Jiefeng Xu , Evren Gul , Alvin Lim 2020

Floor space optimization is a critical revenue management problem commonly encountered by retailers. It maximizes store revenue by optimally allocating floor space to product categories which are assigned to their most appropriate planograms. We form ulate the problem as a connected multi-choice knapsack problem with an additional global constraint and propose a tabu search based meta-heuristic that exploits the multiple special neighborhood structures. We also incorporate a mechanism to determine how to combine the multiple neighborhood moves. A candidate list strategy based on learning from prior search history is also employed to improve the search quality. The results of computational testing with a set of test problems show that our tabu search heuristic can solve all problems within a reasonable amount of time. Analyses of individual contributions of relevant components of the algorithm were conducted with computational experiments.

الذكاء الاصطناعي التعلم الآلي الحوسبة العصبية والتطورية

Transfer Learning to Learn with Multitask Neural Model Search

145 - Catherine Wong , Andrea Gesmundo 2017

Deep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of gr id search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks. In this paper, we present the Multitask Neural Model Search (MNMS) controller. Our goal is to learn a generalizable framework that can condition model construction on successful model searches for previously seen tasks, thus significantly speeding up the search for new tasks. We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task. We then show that pre-trained MNMS controllers can transfer learning to new tasks. By leveraging knowledge from previous searches, we find that pre-trained MNMS models start from a better location in the search space and reduce search time on unseen tasks, while still discovering models that outperform published human-designed models.

الذكاء الاصطناعي التعلم الآلي التعلم الالي