No Arabic abstract
We propose a self-supervised method for partial point set registration. While recent proposed learning-based methods have achieved impressive registration performance on the full shape observations, these methods mostly suffer from performance degradation when dealing with partial shapes. To bridge the performance gaps between partial point set registration with full point set registration, we proposed to incorporate a shape completion network to benefit the registration process. To achieve this, we design a latent code for each pair of shapes, which can be regarded as a geometric encoding of the target shape. By doing so, our model does need an explicit feature embedding network to learn the feature encodings. More importantly, both our shape completion network and the point set registration network take the shared latent codes as input, which are optimized along with the parameters of two decoder networks in the training process. Therefore, the point set registration process can thus benefit from the joint optimization process of latent codes, which are enforced to represent the information of full shape instead of partial ones. In the inference stage, we fix the network parameter and optimize the latent codes to get the optimal shape completion and registration results. Our proposed method is pure unsupervised and does not need any ground truth supervision. Experiments on the ModelNet40 dataset demonstrate the effectiveness of our model for partial point set registration.
Matching articulated shapes represented by voxel-sets reduces to maximal sub-graph isomorphism when each set is described by a weighted graph. Spectral graph theory can be used to map these graphs onto lower dimensional spaces and match shapes by aligning their embeddings in virtue of their invariance to change of pose. Classical graph isomorphism schemes relying on the ordering of the eigenvalues to align the eigenspaces fail when handling large data-sets or noisy data. We derive a new formulation that finds the best alignment between two congruent $K$-dimensional sets of points by selecting the best subset of eigenfunctions of the Laplacian matrix. The selection is done by matching eigenfunction signatures built with histograms, and the retained set provides a smart initialization for the alignment problem with a considerable impact on the overall performance. Dense shape matching casted into graph matching reduces then, to point registration of embeddings under orthogonal transformations; the registration is solved using the framework of unsupervised clustering and the EM algorithm. Maximal subset matching of non identical shapes is handled by defining an appropriate outlier class. Experimental results on challenging examples show how the algorithm naturally treats changes of topology, shape variations and different sampling densities.
Aligning partial views of a scene into a single whole is essential to understanding ones environment and is a key component of numerous robotics tasks such as SLAM and SfM. Recent approaches have proposed end-to-end systems that can outperform traditional methods by leveraging pose supervision. However, with the rising prevalence of cameras with depth sensors, we can expect a new stream of raw RGB-D data without the annotations needed for supervision. We propose UnsupervisedR&R: an end-to-end unsupervised approach to learning point cloud registration from raw RGB-D video. The key idea is to leverage differentiable alignment and rendering to enforce photometric and geometric consistency between frames. We evaluate our approach on indoor scene datasets and find that we outperform existing traditional approaches with classic and learned descriptors while being competitive with supervised geometric point cloud registration approaches.
Point cloud registration is the process of aligning a pair of point sets via searching for a geometric transformation. Unlike classical optimization-based methods, recent learning-based methods leverage the power of deep learning for registering a pair of point sets. In this paper, we propose to develop a novel model that organically integrates the optimization to learning, aiming to address the technical challenges in 3D registration. More specifically, in addition to the deep transformation decoding network, our framework introduce an optimizable deep underline{S}patial underline{C}orrelation underline{R}epresentation (SCR) feature. The SCR feature and weights of the transformation decoder network are jointly updated towards the minimization of an unsupervised alignment loss. We further propose an adaptive Chamfer loss for aligning partial shapes. To verify the performance of our proposed method, we conducted extensive experiments on the ModelNet40 dataset. The results demonstrate that our method achieves significantly better performance than the previous state-of-the-art approaches in the full/partial point set registration task.
Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategies. However, such techniques, in practice, tend to suffer from instability and convergence issues, requiring tedious fine-tuning for precise parameter setting. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization and refinement procedure with point set registration algorithm used in image processing. We show that our framework alleviates the shortcomings of existing methodologies, and is relatively invariant to variable adversarial learning performance, depicting robustness in terms of parameter choices and training losses. Experimental evaluation on parallel dictionary induction task demonstrates state-of-the-art results for our framework on diverse language pairs.
In this work, we propose UPDesc, an unsupervised method to learn point descriptors for robust point cloud registration. Our work builds upon a recent supervised 3D CNN-based descriptor extraction framework, namely, 3DSmoothNet, which leverages a voxel-based representation to parameterize the surrounding geometry of interest points. Instead of using a predefined fixed-size local support in voxelization, which potentially limits the access of richer local geometry information, we propose to learn the support size in a data-driven manner. To this end, we design a differentiable voxelization module that can back-propagate gradients to the support size optimization. To optimize descriptor similarity, the prior 3D CNN work and other supervised methods require abundant correspondence labels or pose annotations of point clouds for crafting metric learning losses. Differently, we show that unsupervised learning of descriptor similarity can be achieved by performing geometric registration in networks. Our learning objectives consider descriptor similarity both across and within point clouds without supervision. Through extensive experiments on point cloud registration benchmarks, we show that our learned descriptors yield superior performance over existing unsupervised methods.