Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

216 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Youngwoon Lee

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jun Yamada - Youngwoon Lee - Gautam Salhotra

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models of the agent and environment to plan collision-free paths to faraway goals, but suffer from inaccurate models in tasks that require contacts with the environment. To combine the benefits of both approaches, we propose motion planner augmented RL (MoPA-RL) which augments the action space of an RL agent with the long-horizon planning capabilities of motion planners. Based on the magnitude of the action, our approach smoothly transitions between directly executing the action and invoking a motion planner. We evaluate our approach on various simulated manipulation tasks and compare it to alternative action spaces in terms of learning efficiency and safety. The experiments demonstrate that MoPA-RL increases learning efficiency, leads to a faster exploration, and results in safer policies that avoid collisions with the environment. Videos and code are available at https://clvrai.com/mopa-rl .

قيم البحث

361 - Timothee Anne , Jack Wilkinson , Zhibin Li 2021

This work developed a meta-learning approach that adapts the control policy on the fly to different changing conditions for robust locomotion. The proposed method constantly updates the interaction model, samples feasible sequences of actions of esti mated the state-action trajectories, and then applies the optimal actions to maximize the reward. To achieve online model adaptation, our proposed method learns different latent vectors of each training condition, which are selected online given the newly collected data. Our work designs appropriate state space and reward functions, and optimizes feasible actions in an MPC fashion which are then sampled directly in the joint space considering constraints, hence requiring no prior design of specific walking gaits. We further demonstrate the robots capability of detecting unexpected changes during interaction and adapting control policies quickly. The extensive validation on the SpotMicro robot in a physics simulation shows adaptive and robust locomotion skills under varying ground friction, external pushes, and different robot models including hardware faults and changes.

علم الروبوتات الذكاء الاصطناعي

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

455 - Ajay Mandlekar , Danfei Xu , Josiah Wong 2021

Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learning algorithms for robot manipulation on five simulated and three real-world multi-stage manipulation tasks of varying complexity, and with datasets of varying quality. Our study analyzes the most critical challenges when learning from offline human data for manipulation. Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation. We also highlight opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods, and the ability to easily scale to natural, real-world manipulation scenarios where only raw sensory signals are available. We have open-sourced our datasets and all algorithm implementations to facilitate future research and fair comparisons in learning from human demonstration data. Codebase, datasets, trained models, and more available at https://arise-initiative.github.io/robomimic-web/

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning

176 - Shuijing Liu , Peixin Chang , Weihang Liang 2020

Safe and efficient navigation through human crowds is an essential capability for mobile robots. Previous work on robot crowd navigation assumes that the dynamics of all agents are known and well-defined. In addition, the performance of previous meth ods deteriorates in partially observable environments and environments with dense crowds. To tackle these problems, we propose decentralized structural-Recurrent Neural Network (DS-RNN), a novel network that reasons about spatial and temporal relationships for robot decision making in crowd navigation. We train our network with model-free deep reinforcement learning without any expert supervision. We demonstrate that our model outperforms previous methods in challenging crowd navigation scenarios. We successfully transfer the policy learned in the simulator to a real-world TurtleBot 2i.

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

98 - Zuxin Liu , Baiming Chen , Hongyi Zhou 2020

Multi-agent navigation in dynamic environments is of great industrial value when deploying a large scale fleet of robot to real-world applications. This paper proposes a decentralized partially observable multi-agent path planning with evolutionary r einforcement learning (MAPPER) method to learn an effective local planning policy in mixed dynamic environments. Reinforcement learning-based methods usually suffer performance degradation on long-horizon tasks with goal-conditioned sparse rewards, so we decompose the long-range navigation task into many easier sub-tasks under the guidance of a global planner, which increases agents performance in large environments. Moreover, most existing multi-agent planning approaches assume either perfect information of the surrounding environment or homogeneity of nearby dynamic agents, which may not hold in practice. Our approach models dynamic obstacles behavior with an image-based representation and trains a policy in mixed dynamic environments without homogeneity assumption. To ensure multi-agent training stability and performance, we propose an evolutionary training approach that can be easily scaled to large and complex environments. Experiments show that MAPPER is able to achieve higher success rates and more stable performance when exposed to a large number of non-cooperative dynamic obstacles compared with traditional reaction-based planner LRA* and the state-of-the-art learning-based method.

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

R-SARL: Crowd-aware Navigation Based Deep Reinforcement Learning for Nonholonomic Robot in Complex Environments

79 - Yanying Zhou , Shijie Li , Jochen Garcke 2021

Robot navigation in a safe way for complex and crowded situations is studied in this work. When facing complex environments with both static and dynamic obstacles, in existing works unicycle nonholonomic robots are prone to two extreme behaviors, one is to fall into dead ends formed by obstacles, and the other is to not complete the navigation task in time due to excessive collision avoidance.As a result, we propose the R-SARL framework, which is based on a deep reinforcement learning algorithm and where we augment the reward function to avoid collisions. In particular, we estimate unsafe interactions between the robot and obstacles in a look-ahead distance and penalize accordingly, so that the robot can avoid collisions in advance and reach its destination safely.Furthermore, we penalize frequent excessive detours to reduce the timeout and thus improve the efficiency of navigation.We test our method in various challenging and complex crowd navigation tasks. The results show that our method improves navigation performance and outperforms state-of-the-art methods.

علم الروبوتات