ترغب بنشر مسار تعليمي؟ اضغط هنا

KETO: Learning Keypoint Representations for Tool Manipulation

281   0   0.0 ( 0 )
 نشر من قبل Zengyi Qin
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We aim to develop an algorithm for robots to manipulate novel objects as tools for completing different task goals. An efficient and informative representation would facilitate the effectiveness and generalization of such algorithms. For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation. For each task, a set of task-specific keypoints is jointly predicted from 3D point clouds of the tool object by a deep neural network. These keypoints offer a concise and informative description of the object to determine grasps and subsequent manipulation actions. The model is learned from self-supervised robot interactions in the task environment without the need for explicit human annotations. We evaluate our framework in three manipulation tasks with tool use. Our model consistently outperforms state-of-the-art methods in terms of task success rates. Qualitative results of keypoint prediction and tool generation are shown to visualize the learned representations.



قيم البحث

اقرأ أيضاً

Humans have impressive generalization capabilities when it comes to manipulating objects and tools in completely novel environments. These capabilities are, at least partially, a result of humans having internal models of their bodies and any grasped object. How to learn such body schemas for robots remains an open problem. In this work, we develop an self-supervised approach that can extend a robots kinematic model when grasping an object from visual latent representations. Our framework comprises two components: (1) we present a multi-modal keypoint detector: an autoencoder architecture trained by fusing proprioception and vision to predict visual key points on an object; (2) we show how we can use our learned keypoint detector to learn an extension of the kinematic chain by regressing virtual joints from the predicted visual keypoints. Our evaluation shows that our approach learns to consistently predict visual keypoints on objects in the manipulators hand, and thus can easily facilitate learning an extended kinematic chain to include the object grasped in various configurations, from a few seconds of visual data. Finally we show that this extended kinematic chain lends itself for object manipulation tasks such as placing a grasped object and present experiments in simulation and on hardware.
3D scene representation for robot manipulation should capture three key object properties: permanency -- objects that become occluded over time continue to exist; amodal completeness -- objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity -- the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Video is available at https://youtu.be/GQjYG3nQJ80.
Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly grasping and manipulating the tool to achieve the task. Task-agnostic grasping optimiz es for grasp robustness while ignoring crucial task-specific constraints. In this paper, we propose the Task-Oriented Grasping Network (TOG-Net) to jointly optimize both task-oriented grasping of a tool and the manipulation policy for that tool. The training process of the model is based on large-scale simulated self-supervision with procedurally generated tool objects. We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering. Our model achieves overall 71.1% task success rate for sweeping and 80.0% task success rate for hammering. Supplementary material is available at: bit.ly/task-oriented-grasp
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.
Deep reinforcement learning has made significant progress in robotic manipulation tasks and it works well in the ideal disturbance-free environment. However, in a real-world environment, both internal and external disturbances are inevitable, thus th e performance of the trained policy will dramatically drop. To improve the robustness of the policy, we introduce the adversarial training mechanism to the robotic manipulation tasks in this paper, and an adversarial skill learning algorithm based on soft actor-critic (SAC) is proposed for robust manipulation. Extensive experiments are conducted to demonstrate that the learned policy is robust to internal and external disturbances. Additionally, the proposed algorithm is evaluated in both the simulation environment and on the real robotic platform.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا