ترغب بنشر مسار تعليمي؟ اضغط هنا

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation

177   0   0.0 ( 0 )
 نشر من قبل Gregory Kahn
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A general-purpose intelligent robot must be able to learn autonomously and be able to accomplish multiple tasks in order to be deployed in the real world. However, standard reinforcement learning approaches learn separate task-specific policies and assume the reward function for each task is known a priori. We propose a framework that learns event cues from off-policy data, and can flexibly combine these event cues at test time to accomplish different tasks. These event cue labels are not assumed to be known a priori, but are instead labeled using learned models, such as computer vision detectors, and then `backed up in time using an action-conditioned predictive model. We show that a simulated robotic car and a real-world RC car can gather data and train fully autonomously without any human-provided labels beyond those needed to train the detectors, and then at test-time be able to accomplish a variety of different tasks. Videos of the experiments and code can be found at https://github.com/gkahn13/CAPs

قيم البحث

اقرأ أيضاً

Safe and efficient navigation through human crowds is an essential capability for mobile robots. Previous work on robot crowd navigation assumes that the dynamics of all agents are known and well-defined. In addition, the performance of previous meth ods deteriorates in partially observable environments and environments with dense crowds. To tackle these problems, we propose decentralized structural-Recurrent Neural Network (DS-RNN), a novel network that reasons about spatial and temporal relationships for robot decision making in crowd navigation. We train our network with model-free deep reinforcement learning without any expert supervision. We demonstrate that our model outperforms previous methods in challenging crowd navigation scenarios. We successfully transfer the policy learned in the simulator to a real-world TurtleBot 2i.
Safety is of great importance in multi-robot navigation problems. In this paper, we propose a control barrier function (CBF) based optimizer that ensures robot safety with both high probability and flexibility, using only sensor measurement. The opti mizer takes action commands from the policy network as initial values and then provides refinement to drive the potentially dangerous ones back into safe regions. With the help of a deep transition model that predicts the evolution of surrounding dynamics and the consequences of different actions, the CBF module can guide the optimization in a reasonable time horizon. We also present a novel joint training framework that improves the cooperation between the Reinforcement Learning (RL) based policy and the CBF-based optimizer both in training and inference procedures by utilizing reward feedback from the CBF module. We observe that the policy using our method can achieve a higher success rate while maintaining the safety of multiple robots in significantly fewer episodes compared with other methods. Experiments are conducted in multiple scenarios both in simulation and the real world, the results demonstrate the effectiveness of our method in maintaining the safety of multi-robot navigation. Code is available at url{https://github.com/YuxiangCui/MARL-OCBF
In the current level of evolution of Soccer 3D, motion control is a key factor in teams performance. Recent works takes advantages of model-free approaches based on Machine Learning to exploit robot dynamics in order to obtain faster locomotion skill s, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep Reinforcement Learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robots dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popu lar form of task specification, as they are already grounded in the robots observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robots observation space. To scalably learn this grounding we propose to leverage offline robot datasets (including highly sub-optimal, autonomously collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform visuomotor tasks from natural language, such as open the right drawer and move the stapler, on a Franka Emika Panda robot.
In this paper, we present the Role Playing Learning (RPL) scheme for a mobile robot to navigate socially with its human companion in populated environments. Neural networks (NN) are constructed to parameterize a stochastic policy that directly maps s ensory data collected by the robot to its velocity outputs, while respecting a set of social norms. An efficient simulative learning environment is built with maps and pedestrians trajectories collected from a number of real-world crowd data sets. In each learning iteration, a robot equipped with the NN policy is created virtually in the learning environment to play itself as a companied pedestrian and navigate towards a goal in a socially concomitant manner. Thus, we call this process Role Playing Learning, which is formulated under a reinforcement learning (RL) framework. The NN policy is optimized end-to-end using Trust Region Policy Optimization (TRPO), with consideration of the imperfectness of robots sensor measurements. Simulative and experimental results are provided to demonstrate the efficacy and superiority of our method.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا