No Arabic abstract
In this work, we propose UPDesc, an unsupervised method to learn point descriptors for robust point cloud registration. Our work builds upon a recent supervised 3D CNN-based descriptor extraction framework, namely, 3DSmoothNet, which leverages a voxel-based representation to parameterize the surrounding geometry of interest points. Instead of using a predefined fixed-size local support in voxelization, which potentially limits the access of richer local geometry information, we propose to learn the support size in a data-driven manner. To this end, we design a differentiable voxelization module that can back-propagate gradients to the support size optimization. To optimize descriptor similarity, the prior 3D CNN work and other supervised methods require abundant correspondence labels or pose annotations of point clouds for crafting metric learning losses. Differently, we show that unsupervised learning of descriptor similarity can be achieved by performing geometric registration in networks. Our learning objectives consider descriptor similarity both across and within point clouds without supervision. Through extensive experiments on point cloud registration benchmarks, we show that our learned descriptors yield superior performance over existing unsupervised methods.
Extracting robust and general 3D local features is key to downstream tasks such as point cloud registration and reconstruction. Existing learning-based local descriptors are either sensitive to rotation transformations, or rely on classical handcrafted features which are neither general nor representative. In this paper, we introduce a new, yet conceptually simple, neural architecture, termed SpinNet, to extract local features which are rotationally invariant whilst sufficiently informative to enable accurate registration. A Spatial Point Transformer is first introduced to map the input local surface into a carefully designed cylindrical space, enabling end-to-end optimization with SO(2) equivariant representation. A Neural Feature Extractor which leverages the powerful point-based and 3D cylindrical convolutional neural layers is then utilized to derive a compact and representative descriptor for matching. Extensive experiments on both indoor and outdoor datasets demonstrate that SpinNet outperforms existing state-of-the-art techniques by a large margin. More critically, it has the best generalization ability across unseen scenarios with different sensor modalities. The code is available at https://github.com/QingyongHu/SpinNet.
In this paper, we propose a novel method named GP-Aligner to deal with the problem of non-rigid groupwise point set registration. Compared to previous non-learning approaches, our proposed method gains competitive advantages by leveraging the power of deep neural networks to effectively and efficiently learn to align a large number of highly deformed 3D shapes with superior performance. Unlike most learning-based methods that use an explicit feature encoding network to extract the per-shape features and their correlations, our model leverages a model-free learnable latent descriptor to characterize the group relationship. More specifically, for a given group we first define an optimizable Group Latent Descriptor (GLD) to characterize the gruopwise relationship among a group of point sets. Each GLD is randomly initialized from a Gaussian distribution and then concatenated with the coordinates of each point of the associated point sets in the group. A neural network-based decoder is further constructed to predict the coherent drifts as the desired transformation from input groups of shapes to aligned groups of shapes. During the optimization process, GP-Aligner jointly updates all GLDs and weight parameters of the decoder network towards the minimization of an unsupervised groupwise alignment loss. After optimization, for each group our model coherently drives each point set towards a middle, common position (shape) without specifying one as the target. GP-Aligner does not require large-scale training data for network training and it can directly align groups of point sets in a one-stage optimization process. GP-Aligner shows both accuracy and computational efficiency improvement in comparison with state-of-the-art methods for groupwise point set registration. Moreover, GP-Aligner is shown great efficiency in aligning a large number of groups of real-world 3D shapes.
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data. In this paper, we resolve these two challenges simultaneously. First, we propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations. This representation not only makes the solution space well-constrained but also enables our method to be solved iteratively with a recurrent framework, which greatly reduces the difficulty of learning. Second, we introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images so that our full framework can be trained end-to-end without ground truth supervision. Extensive experiments on several different datasets demonstrate that our proposed method outperforms the previous state-of-the-art by a large margin. The source codes are available at https://github.com/WanquanF/RMA-Net.
Point cloud segmentation is a fundamental task in 3D. Despite recent progress on point cloud segmentation with the power of deep networks, current deep learning methods based on the clean label assumptions may fail with noisy labels. Yet, object class labels are often mislabeled in real-world point cloud datasets. In this work, we take the lead in solving this issue by proposing a novel Point Noise-Adaptive Learning (PNAL) framework. Compared to existing noise-robust methods on image tasks, our PNAL is noise-rate blind, to cope with the spatially variant noise rate problem specific to point clouds. Specifically, we propose a novel point-wise confidence selection to obtain reliable labels based on the historical predictions of each point. A novel cluster-wise label correction is proposed with a voting strategy to generate the best possible label taking the neighbor point correlations into consideration. We conduct extensive experiments to demonstrate the effectiveness of PNAL on both synthetic and real-world noisy datasets. In particular, even with $60%$ symmetric noisy labels, our proposed method produces much better results than its baseline counterpart without PNAL and is comparable to the ideal upper bound trained on a completely clean dataset. Moreover, we fully re-labeled the validation set of a popular but noisy real-world scene dataset ScanNetV2 to make it clean, for rigorous experiment and future research. Our code and data are available at url{https://shuquanye.com/PNAL_website/}.
The performance of surface registration relies heavily on the metric used for the alignment error between the source and target shapes. Traditionally, such a metric is based on the point-to-point or point-to-plane distance from the points on the source surface to their closest points on the target surface, which is susceptible to failure due to instability of the closest-point correspondence. In this paper, we propose a novel metric based on the intersection points between the two shapes and a random straight line, which does not assume a specific correspondence. We verify the effectiveness of this metric by extensive experiments, including its direct optimization for a single registration problem as well as unsupervised learning for a set of registration problems. The results demonstrate that the algorithms utilizing our proposed metric outperforms the state-of-the-art optimization-based and unsupervised learning-based methods.