Do you want to publish a course? Click here

CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation

158   0   0.0 ( 0 )
 Added by Kartik Gupta
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We present a new approach for a single view, image-based object pose estimation. Specifically, the problem of culling false positives among several pose proposal estimates is addressed in this paper. Our proposed approach targets the problem of inaccurate confidence values predicted by CNNs which is used by many current methods to choose a final object pose prediction. We present a network called CullNet, solving this task. CullNet takes pairs of pose masks rendered from a 3D model and cropped regions in the original image as input. This is then used to calibrate the confidence scores of the pose proposals. This new set of confidence scores is found to be significantly more reliable for accurate object pose estimation as shown by our results. Our experimental results on multiple challenging datasets (LINEMOD and Occlusion LINEMOD) reflects the utility of our proposed method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on these standard object pose estimation datasets. Our code is publicly available on https://github.com/kartikgupta-at-anu/CullNet.



rate research

Read More

In this work, we present a modified fuzzy decision forest for real-time 3D object pose estimation based on typical template representation. We employ an extra preemptive background rejector node in the decision forest framework to terminate the examination of background locations as early as possible, result in a significantly improvement on efficiency. Our approach is also scalable to large dataset since the tree structure naturally provides a logarithm time complexity to the number of objects. Finally we further reduce the validation stage with a fast breadth-first scheme. The results show that our approach outperform the state-of-the-arts on the efficiency while maintaining a comparable accuracy.
We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz.
In this work, we propose a new solution to 3D human pose estimation in videos. Instead of directly regressing the 3D joint locations, we draw inspiration from the human skeleton anatomy and decompose the task into bone direction prediction and bone length prediction, from which the 3D joint locations can be completely derived. Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time. This promotes us to develop effective techniques to utilize global information across all the frames in a video for high-accuracy bone length prediction. Moreover, for the bone direction prediction network, we propose a fully-convolutional propagating architecture with long skip connections. Essentially, it predicts the directions of different bones hierarchically without using any time-consuming memory units e.g. LSTM). A novel joint shift loss is further introduced to bridge the training of the bone length and bone direction prediction networks. Finally, we employ an implicit attention mechanism to feed the 2D keypoint visibility scores into the model as extra guidance, which significantly mitigates the depth ambiguity in many challenging poses. Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets, where comprehensive evaluation validates the effectiveness of our model.
Vision based object grasping and manipulation in robotics require accurate estimation of objects 6D pose. The 6D pose estimation has received significant attention in computer vision community and multiple datasets and evaluation metrics have been proposed. However, the existing metrics measure how well two geometrical surfaces are aligned - ground truth vs. estimated pose - which does not directly measure how well a robot can perform the task with the given estimate. In this work we propose a probabilistic metric that directly measures success in robotic tasks. The evaluation metric is based on non-parametric probability density that is estimated from samples of a real physical setup. During the pose evaluation stage the physical setup is not needed. The evaluation metric is validated in controlled experiments and a new pose estimation dataset of industrial parts is introduced. The experimental results with the parts confirm that the proposed evaluation metric better reflects the true performance in robotics than the existing metrics.
3D hand-object pose estimation is an important issue to understand the interaction between human and environment. Current hand-object pose estimation methods require detailed 3D labels, which are expensive and labor-intensive. To tackle the problem of data collection, we propose a semi-supervised 3D hand-object pose estimation method with two key techniques: pose dictionary learning and an object-oriented coordinate system. The proposed pose dictionary learning module can distinguish infeasible poses by reconstruction error, enabling unlabeled data to provide supervision signals. The proposed object-oriented coordinate system can make 3D estimations equivariant to the camera perspective. Experiments are conducted on FPHA and HO-3D datasets. Our method reduces estimation error by 19.5% / 24.9% for hands/objects compared to straightforward use of labeled data on FPHA and outperforms several baseline methods. Extensive experiments also validate the robustness of the proposed method.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا