No Arabic abstract
Multi-view point cloud registration is a hot topic in the communities of multimedia technology and artificial intelligence (AI). In this paper, we propose a framework to reconstruct the 3D models by the multi-view point cloud registration algorithm with adaptive convergence threshold, and subsequently apply it to 3D model retrieval. The iterative closest point (ICP) algorithm is implemented combining with the motion average algorithm for the registration of multi-view point clouds. After the registration process, we design applications for 3D model retrieval. The geometric saliency map is computed based on the vertex curvature. The test facial triangle is then generated based on the saliency map, which is applied to compare with the standard facial triangle. The face and non-face models are then discriminated. The experiments and comparisons prove the effectiveness of the proposed framework.
Point cloud registration is a fundamental problem in 3D computer vision. In this paper, we cast point cloud registration into a planning problem in reinforcement learning, which can seek the transformation between the source and target point clouds through trial and error. By modeling the point cloud registration process as a Markov decision process (MDP), we develop a latent dynamic model of point clouds, consisting of a transformation network and evaluation network. The transformation network aims to predict the new transformed feature of the point cloud after performing a rigid transformation (i.e., action) on it while the evaluation network aims to predict the alignment precision between the transformed source point cloud and target point cloud as the reward signal. Once the dynamic model of the point cloud is trained, we employ the cross-entropy method (CEM) to iteratively update the planning policy by maximizing the rewards in the point cloud registration process. Thus, the optimal policy, i.e., the transformation between the source and target point clouds, can be obtained via gradually narrowing the search space of the transformation. Experimental results on ModelNet40 and 7Scene benchmark datasets demonstrate that our method can yield good registration performance in an unsupervised manner.
Extracting robust and general 3D local features is key to downstream tasks such as point cloud registration and reconstruction. Existing learning-based local descriptors are either sensitive to rotation transformations, or rely on classical handcrafted features which are neither general nor representative. In this paper, we introduce a new, yet conceptually simple, neural architecture, termed SpinNet, to extract local features which are rotationally invariant whilst sufficiently informative to enable accurate registration. A Spatial Point Transformer is first introduced to map the input local surface into a carefully designed cylindrical space, enabling end-to-end optimization with SO(2) equivariant representation. A Neural Feature Extractor which leverages the powerful point-based and 3D cylindrical convolutional neural layers is then utilized to derive a compact and representative descriptor for matching. Extensive experiments on both indoor and outdoor datasets demonstrate that SpinNet outperforms existing state-of-the-art techniques by a large margin. More critically, it has the best generalization ability across unseen scenarios with different sensor modalities. The code is available at https://github.com/QingyongHu/SpinNet.
Deep learning-based point cloud registration models are often generalized from extensive training over a large volume of data to learn the ability to predict the desired geometric transformation to register 3D point clouds. In this paper, we propose a meta-learning based 3D registration model, named 3D Meta-Registration, that is capable of rapidly adapting and well generalizing to new 3D registration tasks for unseen 3D point clouds. Our 3D Meta-Registration gains a competitive advantage by training over a variety of 3D registration tasks, which leads to an optimized model for the best performance on the distribution of registration tasks including potentially unseen tasks. Specifically, the proposed 3D Meta-Registration model consists of two modules: 3D registration learner and 3D registration meta-learner. During the training, the 3D registration learner is trained to complete a specific registration task aiming to determine the desired geometric transformation that aligns the source point cloud with the target one. In the meantime, the 3D registration meta-learner is trained to provide the optimal parameters to update the 3D registration learner based on the learned task distribution. After training, the 3D registration meta-learner, which is learned with the optimized coverage of distribution of 3D registration tasks, is able to dynamically update 3D registration learners with desired parameters to rapidly adapt to new registration tasks. We tested our model on synthesized dataset ModelNet and FlyingThings3D, as well as real-world dataset KITTI. Experimental results demonstrate that 3D Meta-Registration achieves superior performance over other previous techniques (e.g. FlowNet3D).
To eliminate the problems of large dimensional differences, big semantic gap, and mutual interference caused by hybrid features, in this paper, we propose a novel Multi-Features Guidance Network for partial-to-partial point cloud registration(MFG). The proposed network mainly includes four parts: keypoints feature extraction, correspondences searching, correspondences credibility computation, and SVD, among which correspondences searching and correspondence credibility computation are the cores of the network. Unlike the previous work, we utilize the shape features and the spatial coordinates to guide correspondences search independently and fusing the matching results to obtain the final matching matrix. In the correspondences credibility computation module, based on the conflicted relationship between the features matching matrix and the coordinates matching matrix, we score the reliability for each correspondence, which can reduce the impact of mismatched or non-matched points. Experimental results show that our network outperforms the current state-of-the-art while maintaining computational efficiency.
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this issue, we propose a single-stage multi-view fusion framework that takes LiDAR birds-eye view, LiDAR range view and camera view images as inputs for 3D object detection. To effectively fuse multi-view features, we propose an attentive pointwise fusion (APF) module to estimate the importance of the three sources with attention mechanisms that can achieve adaptive fusion of multi-view features in a pointwise manner. Furthermore, an attentive pointwise weighting (APW) module is designed to help the network learn structure information and point feature importance with two extra tasks, namely, foreground classification and center regression, and the predicted foreground probability is used to reweight the point features. We design an end-to-end learnable network named MVAF-Net to integrate these two components. Our evaluations conducted on the KITTI 3D object detection datasets demonstrate that the proposed APF and APW modules offer significant performance gains. Moreover, the proposed MVAF-Net achieves the best performance among all single-stage fusion methods and outperforms most two-stage fusion methods, achieving the best trade-off between speed and accuracy on the KITTI benchmark.