Learning to Play by Imitating Humans

126 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pierre Sermanet

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Rostam Dinyari - Pierre Sermanet - Corey Lynch

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Acquiring multiple skills has commonly involved collecting a large number of expert demonstrations per task or engineering custom reward functions. Recently it has been shown that it is possible to acquire a diverse set of skills by self-supervising control on top of human teleoperated play data. Play is rich in state space coverage and a policy trained on this data can generalize to specific tasks at test time outperforming policies trained on individual expert task demonstrations. In this work, we explore the question of whether robots can learn to play to autonomously generate play data that can ultimately enhance performance. By training a behavioral cloning policy on a relatively small quantity of human play, we autonomously generate a large quantity of cloned play data that can be used as additional training. We demonstrate that a general purpose goal-conditioned policy trained on this augmented dataset substantially outperforms one trained only with the original human data on 18 difficult user-specified manipulation tasks in a simulated robotic tabletop environment. A video example of a robot imitating human play can be seen here: https://learning-to-play.github.io/videos/undirected_play1.mp4

قيم البحث

149 - Alessandro Bonardi , Stephen James , Andrew J. Davison 2019

Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and n atural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Pixels to Plans: Learning Non-Prehensile Manipulation by Imitating a Planner

77 - Tarik Tosun , Eric Mitchell , Ben Eisner 2019

We present a novel method enabling robots to quickly learn to manipulate objects by leveraging a motion planner to generate expert training trajectories from a small amount of human-labeled data. In contrast to the traditional sense-plan-act cycle, w e propose a deep learning architecture and training regimen called PtPNet that can estimate effective end-effector trajectories for manipulation directly from a single RGB-D image of an object. Additionally, we present a data collection and augmentation pipeline that enables the automatic generation of large numbers (millions) of training image and trajectory examples with almost no human labeling effort. We demonstrate our approach in a non-prehensile tool-based manipulation task, specifically picking up shoes with a hook. In hardware experiments, PtPNet generates motion plans (open-loop trajectories) that reliably (89% success over 189 trials) pick up four very different shoes from a range of positions and orientations, and reliably picks up a shoe it has never seen before. Compared with a traditional sense-plan-act paradigm, our system has the advantages of operating on sparse information (single RGB-D frame), producing high-quality trajectories much faster than the expert planner (300ms versus several seconds), and generalizing effectively to previously unseen shoes.

علم الروبوتات

Learning Agile Robotic Locomotion Skills by Imitating Animals

118 - Xue Bin Peng , Erwin Coumans , Tingnan Zhang 2020

Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming an d difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

علم الروبوتات التعلم الآلي

DeepSocNav: Social Navigation by Imitating Human Behaviors

142 - Juan Pablo de Vicente , Alvaro Soto 2021

Current datasets to train social behaviors are usually borrowed from surveillance applications that capture visual data from a birds-eye perspective. This leaves aside precious relationships and visual cues that could be captured through a first-pers on view of a scene. In this work, we propose a strategy to exploit the power of current game engines, such as Unity, to transform pre-existing birds-eye view datasets into a first-person view, in particular, a depth view. Using this strategy, we are able to generate large volumes of synthetic data that can be used to pre-train a social navigation model. To test our ideas, we present DeepSocNav, a deep learning based model that takes advantage of the proposed approach to generate synthetic data. Furthermore, DeepSocNav includes a self-supervised strategy that is included as an auxiliary task. This consists of predicting the next depth frame that the agent will face. Our experiments show the benefits of the proposed model that is able to outperform relevant baselines in terms of social navigation scores.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Training Humans to Train Robots Dynamic Motor Skills

79 - Marina Y. Aoyama , Matthew Howard 2021

Learning from demonstration (LfD) is commonly considered to be a natural and intuitive way to allow novice users to teach motor skills to robots. However, it is important to acknowledge that the effectiveness of LfD is heavily dependent on the qualit y of teaching, something that may not be assured with novices. It remains an open question as to the most effective way of guiding demonstrators to produce informative demonstrations beyond ad hoc advice for specific teaching tasks. To this end, this paper investigates the use of machine teaching to derive an index for determining the quality of demonstrations and evaluates its use in guiding and training novices to become better teachers. Experiments with a simple learner robot suggest that guidance and training of teachers through the proposed approach can lead to up to 66.5% decrease in error in the learnt skill.

علم الروبوتات الذكاء الاصطناعي تفاعل الإنسان والحاسوب