Learning to Compose Skills

619 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Himanshu Sahni

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Himanshu Sahni - Saurabh Kumar - Farhan Tejani

الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a differentiable framework capable of learning a wide variety of compositions of simple policies that we call skills. By recursively composing skills with themselves, we can create hierarchies that display complex behavior. Skill networks are trained to generate skill-state embeddings that are provided as inputs to a trainable composition function, which in turn outputs a policy for the overall task. Our experiments on an environment consisting of multiple collect and evade tasks show that this architecture is able to quickly build complex skills from simpler ones. Furthermore, the learned composition function displays some transfer to unseen combinations of skills, allowing for zero-shot generalizations.

قيم البحث

342 - Gongjin Lan , Maarten van Hooft , Matteo De Carlo 2020

The challenge of robotic reproduction -- making of new robots by recombining two existing ones -- has been recently cracked and physically evolving robot systems have come within reach. Here we address the next big hurdle: producing an adequate brain for a newborn robot. In particular, we address the task of targeted locomotion which is arguably a fundamental skill in any practical implementation. We introduce a controller architecture and a generic learning method to allow a modular robot with an arbitrary shape to learn to walk towards a target and follow this target if it moves. Our approach is validated on three robots, a spider, a gecko, and their offspring, in three real-world scenarios.

الذكاء الاصطناعي الحوسبة العصبية والتطورية علم الروبوتات

The Option Keyboard: Combining Skills in Reinforcement Learning

111 - Andre Barreto , Diana Borsa , Shaobo Hou 2021

The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or cumulants). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.

الذكاء الاصطناعي التعلم الآلي

Learning to Compose Hypercolumns for Visual Correspondence

213 - Juhong Min , Jongmin Lee , Jean Ponce 2020

Feature representation plays a crucial role in visual correspondence, and recent methods for image matching resort to deeply stacked convolutional layers. These models, however, are both monolithic and static in the sense that they typically use a sp ecific level of features, e.g., the output of the last layer, and adhere to it regardless of the images to match. In this work, we introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. Inspired by both multi-layer feature composition in object detection and adaptive inference architectures in classification, the proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network. We demonstrate the effectiveness on the task of semantic correspondence, i.e., establishing correspondences between images depicting different instances of the same object or scene category. Experiments on standard benchmarks show that the proposed method greatly improves matching performance over the state of the art in an adaptive and efficient manner.

الرؤية الحاسوبية وتمييز الأنماط

Learning to Compose Words into Sentences with Reinforcement Learning

106 - Dani Yogatama , Phil Blunsom , Chris Dyer 2016

We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted usi ng supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experiments demonstrate the benefit of learning task-specific composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. We analyze the induced trees and show that while they discover some linguistically intuitive structures (e.g., noun phrases, simple verb phrases), they are different than conventional English syntactic structures.

الحساب واللغة

Learning to Compose Dynamic Tree Structures for Visual Contexts

213 - Kaihua Tang , Hanwang Zhang , Baoyuan Wu 2018

We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A. Our visual context tree model, dubbed VCTree, has two key advantages over existing structured object representations including chains and fully-connected graphs: 1) The efficient and expressive binary tree encodes the inherent parallel/hierarchical relationships among objects, e.g., clothes and pants are usually co-occur and belong to person; 2) the dynamic structure varies from image to image and task to task, allowing more content-/task-specific message passing among objects. To construct a VCTree, we design a score function that calculates the task-dependent validity between each object pair, and the tree is the binary version of the maximum spanning tree from the score matrix. Then, visual contexts are encoded by bidirectional TreeLSTM and decoded by task-specific models. We develop a hybrid learning procedure which integrates end-task supervised learning and the tree structure reinforcement learning, where the formers evaluation result serves as a self-critic for the latters structure exploration. Experimental results on two benchmarks, which require reasoning over contexts: Visual Genome for scene graph generation and VQA2.0 for visual Q&A, show that VCTree outperforms state-of-the-art results while discovering interpretable visual context structures.

الرؤية الحاسوبية وتمييز الأنماط