No Arabic abstract
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. In experiments, our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0
The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment, our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robots performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.
For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on this problem, and propose to learn a Riemannian manifold from human demonstrations on which geodesics are natural motion skills. We realize this with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Geodesic motion skills let a robot plan movements from and to arbitrary points on the data manifold. They also provide a straightforward method to avoid obstacles by redefining the ambient metric in an online fashion. Moreover, geodesics naturally exploit the manifold resulting from multiple--mode tasks to design motions that were not explicitly demonstrated previously. We test our learning framework using a 7-DoF robotic manipulator, where the robot satisfactorily learns and reproduces realistic skills featuring elaborated motion patterns, avoids previously unseen obstacles, and generates novel movements in multiple-mode settings.
This paper studies how to improve the generalization performance and learning speed of the navigation agents trained with deep reinforcement learning (DRL). DRL exhibits huge potential in mapless navigation, but DRL agents performing well in training scenarios are found to perform poorly in unfamiliar real-world scenarios. In this work, we present the representation of LiDAR readings as a key factor behind agents performance degradation and propose a simple but powerful input pre-processing (IP) approach to improve the agents performance. As this approach uses adaptively parametric reciprocal functions to pre-process LiDAR readings, we refer to this approach as IPAPRec and its normalized version as IPAPRecN. IPAPRec/IPAPRecN can highlight important short-distance values and compress the range of less-important long-distance values in laser scans, which well addressed the issues induced by conventional representations of laser scans. Their high performance is validated by extensive simulation and real-world experiments. The results show that our methods can substantially improve agents success rates and greatly reduce the training time compared to conventional methods.
Humans are capable of learning a new behavior by observing others perform the skill. Robots can also implement this by imitation learning. Furthermore, if with external guidance, humans will master the new behavior more efficiently. So how can robots implement this? To address the issue, we present Federated Imitation Learning (FIL) in the paper. Firstly, a knowledge fusion algorithm deployed on the cloud for fusing knowledge from local robots is presented. Then, effective transfer learning methods in FIL are introduced. With FIL, a robot is capable of utilizing knowledge from other robots to increase its imitation learning. FIL considers information privacy and data heterogeneity when robots share knowledge. It is suitable to be deployed in cloud robotic systems. Finally, we conduct experiments of a simplified self-driving task for robots (cars). The experimental results demonstrate that FIL is capable of increasing imitation learning of local robots in cloud robotic systems.
We consider the problem of learning multi-stage vision-based tasks on a real robot from a single video of a human performing the task, while leveraging demonstration data of subtasks with other objects. This problem presents a number of major challenges. Video demonstrations without teleoperation are easy for humans to provide, but do not provide any direct supervision. Learning policies from raw pixels enables full generality but calls for large function approximators with many parameters to be learned. Finally, compound tasks can require impractical amounts of demonstration data, when treated as a monolithic skill. To address these challenges, we propose a method that learns both how to learn primitive behaviors from video demonstrations and how to dynamically compose these behaviors to perform multi-stage tasks by watching a human demonstrator. Our results on a simulated Sawyer robot and real PR2 robot illustrate our method for learning a variety of order fulfillment and kitchen serving tasks with novel objects and raw pixel inputs.