Unconstrained Matching of 2D and 3D Descriptors for 6-DOF Pose Estimation

61 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Uzair Nadeem

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Uzair Nadeem - Mohammed Bennamoun - Roberto Togneri

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper proposes a novel concept to directly match feature descriptors extracted from 2D images with feature descriptors extracted from 3D point clouds. We use this concept to directly localize images in a 3D point cloud. We generate a dataset of matching 2D and 3D points and their corresponding feature descriptors, which is used to learn a Descriptor-Matcher classifier. To localize the pose of an image at test time, we extract keypoints and feature descriptors from the query image. The trained Descriptor-Matcher is then used to match the features from the image and the point cloud. The locations of the matched features are used in a robust pose estimation algorithm to predict the location and orientation of the query image. We carried out an extensive evaluation of the proposed method for indoor and outdoor scenarios and with different types of point clouds to verify the feasibility of our approach. Experimental results demonstrate that direct matching of feature descriptors from images and point clouds is not only a viable idea but can also be reliably used to estimate the 6-DOF poses of query cameras in any type of 3D point cloud in an unconstrained manner with high precision.

قيم البحث

125 - Hongsen Liu , Yang Cong , Yandong Tang 2019

We propose a single-shot method for simultaneous 3D object segmentation and 6-DOF pose estimation in pure 3D point clouds scenes based on a consensus that emph{one point only belongs to one object}, i.e., each point has the potential power to predict the 6-DOF pose of its corresponding object. Unlike the recently proposed methods of the similar task, which rely on 2D detectors to predict the projection of 3D corners of the 3D bounding boxes and the 6-DOF pose must be estimated by a PnP like spatial transformation method, ours is concise enough not to require additional spatial transformation between different dimensions. Due to the lack of training data for many objects, the recently proposed 2D detection methods try to generate training data by using rendering engine and achieve good results. However, rendering in 3D space along with 6-DOF is relatively difficult. Therefore, we propose an augmented reality technology to generate the training data in semi-virtual reality 3D space. The key component of our method is a multi-task CNN architecture that can simultaneously predicts the 3D object segmentation and 6-DOF pose estimation in pure 3D point clouds. For experimental evaluation, we generate expanded training data for two state-of-the-arts 3D object datasets cite{PLCHF}cite{TLINEMOD} by using Augmented Reality technology (AR). We evaluate our proposed method on the two datasets. The results show that our method can be well generalized into multiple scenarios and provide performance comparable to or better than the state-of-the-arts.

الرؤية الحاسوبية وتمييز الأنماط

Latent-Class Hough Forests for 6 DoF Object Pose Estimation

78 - Rigas Kouskouridas , Alykhan Tejani , Andreas Doumanoglou 2016

In this paper we present Latent-Class Hough Forests, a method for object detection and 6 DoF pose estimation in heavily cluttered and occluded scenarios. We adapt a state of the art template matching feature into a scale-invariant patch descriptor an d integrate it into a regression forest using a novel template-based split function. We train with positive samples only and we treat class distributions at the leaf nodes as latent variables. During testing we infer by iteratively updating these distributions, providing accurate estimation of background clutter and foreground occlusions and, thus, better detection rate. Furthermore, as a by-product, our Latent-Class Hough Forests can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected two, more challenging, datasets for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We provide extensive experiments on the various parameters of the framework such as patch size, number of trees and number of iterations to infer class distributions at test time. We also evaluate the Latent-Class Hough Forests on all datasets where we outperform state of the art methods.

الرؤية الحاسوبية وتمييز الأنماط

Active Learning for Bayesian 3D Hand Pose Estimation

205 - Razvan Caramalau , Binod Bhattarai , Tae-Kyun Kim 2020

We propose a Bayesian approximation to a deep learning architecture for 3D hand pose estimation. Through this framework, we explore and analyse the two types of uncertainties that are influenced either by data or by the learning capability. Furthermo re, we draw comparisons against the standard estimator over three popular benchmarks. The first contribution lies in outperforming the baseline while in the second part we address the active learning application. We also show that with a newly proposed acquisition function, our Bayesian 3D hand pose estimator obtains lowest errors with the least amount of data. The underlying code is publicly available at https://github.com/razvancaramalau/al_bhpe.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

LCD: Learned Cross-Domain Descriptors for 2D-3D Matching

353 - Quang-Hieu Pham , Mikaela Angelina Uy , Binh-Son Hua 2019

In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative than those obtained from individual training in 2D and 3D domains. To facilitate the training process, we built a new dataset by collecting $approx 1.4$ millions of 2D-3D correspondences with various lighting conditions and settings from publicly available RGB-D scenes. Our descriptor is evaluated in three main experiments: 2D-3D matching, cross-domain retrieval, and sparse-to-dense depth estimation. Experimental results confirm the robustness of our approach as well as its competitive performance not only in solving cross-domain tasks but also in being able to generalize to solve sole 2D and 3D tasks. Our dataset and code are released publicly at url{https://hkust-vgd.github.io/lcd}.

الرؤية الحاسوبية وتمييز الأنماط

Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

96 - Junsheng Fu , Said Pertuz , Jiri Matas 2018

We consider a single-query 6-DoF camera pose estimation with reference images and a point cloud, i.e. the problem of estimating the position and orientation of a camera by using reference images and a point cloud. In this work, we perform a systemati c comparison of three state-of-the-art strategies for 6-DoF camera pose estimation, i.e. feature-based, photometric-based and mutual-information-based approaches. The performance of the studied methods is evaluated on two standard datasets in terms of success rate, translation error and max orientation error. Building on the results analysis, we propose a hybrid approach that combines feature-based and mutual-information-based pose estimation methods since it provides complementary properties for pose estimation. Experiments show that (1) in cases with large environmental variance, the hybrid approach outperforms feature-based and mutual-information-based approaches by an average of 25.1% and 5.8% in terms of success rate, respectively; (2) in cases where query and reference images are captured at similar imaging conditions, the hybrid approach performs similarly as the feature-based approach, but outperforms both photometric-based and mutual-information-based approaches with a clear margin; (3) the feature-based approach is consistently more accurate than mutual-information-based and photometric-based approaches when at least 4 consistent matching points are found between the query and reference images.

الرؤية الحاسوبية وتمييز الأنماط