No Arabic abstract
Picking objects in a narrow space such as shelf bins is an important task for humanoid to extract target object from environment. In those situations, however, there are many occlusions between the camera and objects, and this makes it difficult to segment the target object three dimensionally because of the lack of three dimentional sensor inputs. We address this problem with accumulating segmentation result with multiple camera angles, and generating voxel model of the target object. Our approach consists of two components: first is object probability prediction for input image with convolutional networks, and second is generating voxel grid map which is designed for object segmentation. We evaluated the method with the picking task experiment for target objects in narrow shelf bins. Our method generates dense 3D object segments even with occlusions, and the real robot successfuly picked target objects from the narrow space.
Customized grippers have specifically designed fingers to increase the contact area with the workpieces and improve the grasp robustness. However, grasp planning for customized grippers is challenging due to the object variations, surface contacts and structural constraints of the grippers. In this paper, we propose a learning framework to plan robust grasps for customized grippers in real-time. The learning framework contains a low-level optimization-based planner to search for optimal grasps locally under object shape variations, and a high-level learning-based explorer to learn the grasp exploration based on previous grasp experience. The optimization-based planner uses an iterative surface fitting (ISF) to simultaneously search for optimal gripper transformation and finger displacement by minimizing the surface fitting error. The high-level learning-based explorer trains a region-based convolutional neural network (R-CNN) to propose good optimization regions, which avoids ISF getting stuck in bad local optima and improves the collision avoidance performance. The proposed learning framework with RCNN-ISF is able to consider the structural constraints of the gripper, learn grasp exploration strategy from previous experience, and plan optimal grasps in clutter environment in real-time. The effectiveness of the algorithm is verified by experiments.
This paper proposes a iterative visual recognition system for learning based randomized bin-picking. Since the configuration on randomly stacked objects while executing the current picking trial is just partially different from the configuration while executing the previous picking trial, we consider detecting the poses of objects just by using a part of visual image taken at the current picking trial where it is different from the visual image taken at the previous picking trial. By using this method, we do not need to try to detect the poses of all objects included in the pile at every picking trial. Assuming the 3D vision sensor attached at the wrist of a manipulator, we first explain a method to determine the pose of a 3D vision sensor maximizing the visibility of randomly stacked objects. Then, we explain a method for detecting the poses of randomly stacked objects. Effectiveness of our proposed approach is confirmed by experiments using a dual-arm manipulator where a 3D vision sensor and the two-fingered hand attached at the right and the left wrists, respectively.
In this research, we tackle the problem of picking an object from randomly stacked pile. Since complex physical phenomena of contact among objects and fingers makes it difficult to perform the bin-picking with high success rate, we consider introducing a learning based approach. For the purpose of collecting enough number of training data within a reasonable period of time, we introduce a physics simulator where approximation is used for collision checking. In this paper, we first formulate the learning based robotic bin-picking by using CNN (Convolutional Neural Network). We also obtain the optimum grasping posture of parallel jaw gripper by using CNN. Finally, we show that the effect of approximation introduced in collision checking is relaxed if we use exact 3D model to generate the depth image of the pile as an input to CNN.
This paper shows experimental results on learning based randomized bin-picking combined with iterative visual recognition. We use the random forest to predict whether or not a robot will successfully pick an object for given depth images of the pile taking the collision between a finger and a neighboring object into account. For the discriminator to be accurate, we consider estimating objects poses by merging multiple depth images of the pile captured from different points of view by using a depth sensor attached at the wrist. We show that, even if a robot is predicted to fail in picking an object with a single depth image due to its large occluded area, it is finally predicted as success after merging multiple depth images. In addition, we show that the random forest can be trained with the small number of training data.
We present a novel approach for interactive auditory object analysis with a humanoid robot. The robot elicits sensory information by physically shaking visually indistinguishable plastic capsules. It gathers the resulting audio signals from microphones that are embedded into the robotic ears. A neural network architecture learns from these signals to analyze properties of the contents of the containers. Specifically, we evaluate the material classification and weight prediction accuracy and demonstrate that the framework is fairly robust to acoustic real-world noise.