ﻻ يوجد ملخص باللغة العربية
Learning robotic manipulation through reinforcement learning (RL) using only sparse reward signals is still considered a largely unsolved problem. Leveraging human demonstrations can make the learning process more sample efficient, but obtaining high-quality demonstrations can be costly or unfeasible. In this paper we propose a novel approach that introduces object-level demonstrations, i.e. examples of where the objects should be at any state. These demonstrations are generated automatically through RL hence require no expert knowledge. We observe that, during a manipulation task, an object is moved from an initial to a final position. When seen from the point of view of the object being manipulated, this induces a locomotion task that can be decoupled from the manipulation task and learnt through a physically-realistic simulator. The resulting object-level trajectories, called simulated locomotion demonstrations (SLDs), are then leveraged to define auxiliary rewards that are used to learn the manipulation policy. The proposed approach has been evaluated on 13 tasks of increasing complexity, and has been demonstrated to achieve higher success rate and faster learning rates compared to alternative algorithms. SLDs are especially beneficial for tasks like multi-object stacking and non-rigid object manipulation.
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect a
In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast
Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabele
Model-free deep reinforcement learning (RL) has demonstrated its superiority on many complex sequential decision-making problems. However, heavy dependence on dense rewards and high sample-complexity impedes the wide adoption of these methods in real