No Arabic abstract
We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object by removing obstacles in the appropriate order. The fundamental idea is to segment instances with both visible and occluded masks, which we call `instance occlusion segmentation. To achieve this, we extend an existing instance segmentation model with a novel `relook architecture, in which the model explicitly learns the inter-instance relationship. Also, by using image synthesis, we make the system capable of handling new objects without human annotations. The experimental results show the effectiveness of the relook architecture when compared with a conventional model and of the image synthesis when compared to a human-annotated dataset. We also demonstrate the capability of our system to achieve picking a target in a cluttered environment with a real robot.
In order to manipulate a deformable object, such as rope or cloth, in unstructured environments, robots need a way to estimate its current shape. However, tracking the shape of a deformable object can be challenging because of the objects high flexibility, (self-)occlusion, and interaction with obstacles. Building a high-fidelity physics simulation to aid in tracking is difficult for novel environments. Instead we focus on tracking the object based on RGBD images and geometric motion estimates and obstacles. Our key contributions over previous work in this vein are: 1) A better way to handle severe occlusion by using a motion model to regularize the tracking estimate; and 2) The formulation of textit{convex} geometric constraints, which allow us to prevent self-intersection and penetration into known obstacles via a post-processing step. These contributions allow us to outperform previous methods by a large margin in terms of accuracy in scenarios with severe occlusion and obstacles.
We present joint learning of instance and semantic segmentation for visible and occluded region masks. Sharing the feature extractor with instance occlusion segmentation, we introduce semantic occlusion segmentation into the instance segmentation model. This joint learning fuses the instance- and image-level reasoning of the mask prediction on the different segmentation tasks, which was missing in the previous work of learning instance segmentation only (instance-only). In the experiments, we evaluated the proposed joint learning comparing the instance-only learning on the test dataset. We also applied the joint learning model to 2 different types of robotic pick-and-place tasks (random and target picking) and evaluated its effectiveness to achieve real-world robotic tasks.
Picking objects in a narrow space such as shelf bins is an important task for humanoid to extract target object from environment. In those situations, however, there are many occlusions between the camera and objects, and this makes it difficult to segment the target object three dimensionally because of the lack of three dimentional sensor inputs. We address this problem with accumulating segmentation result with multiple camera angles, and generating voxel model of the target object. Our approach consists of two components: first is object probability prediction for input image with convolutional networks, and second is generating voxel grid map which is designed for object segmentation. We evaluated the method with the picking task experiment for target objects in narrow shelf bins. Our method generates dense 3D object segments even with occlusions, and the real robot successfuly picked target objects from the narrow space.
This paper presents an AI system applied to location and robotic grasping. Experimental setup is based on a parameter study to train a deep-learning network based on Mask-RCNN to perform waste location in indoor and outdoor environment, using five different classes and generating a new waste dataset. Initially the AI system obtain the RGBD data of the environment, followed by the detection of objects using the neural network. Later, the 3D object shape is computed using the network result and the depth channel. Finally, the shape is used to compute grasping for a robot arm with a two-finger gripper. The objective is to classify the waste in groups to improve a recycling strategy.
We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded. Recent RGB-D-based methods are robust to moderate degrees of occlusion. For RGB inputs, no previous method works well for partly occluded objects. Our main contribution is to present the first deep learning-based system that estimates accurate poses for partly occluded objects from RGB-D and RGB input. We achieve this with a new instance-aware pipeline that decomposes 6D object pose estimation into a sequence of simpler steps, where each step removes specific aspects of the problem. The first step localizes all known objects in the image using an instance segmentation network, and hence eliminates surrounding clutter and occluders. The second step densely maps pixels to 3D object surface positions, so called object coordinates, using an encoder-decoder network, and hence eliminates object appearance. The third, and final, step predicts the 6D pose using geometric optimization. We demonstrate that we significantly outperform the state-of-the-art for pose estimation of partly occluded objects for both RGB and RGB-D input.