No Arabic abstract
In the field of large-scale SLAM for autonomous driving and mobile robotics, 3D point cloud based place recognition has aroused significant research interest due to its robustness to changing environments with drastic daytime and weather variance. However, it is time-consuming and effort-costly to obtain high-quality point cloud data for place recognition model training and ground truth for registration in the real world. To this end, a novel registration-aided 3D domain adaptation network for point cloud based place recognition is proposed. A structure-aware registration network is introduced to help to learn features with geometric information and a 6-DoFs pose between two point clouds with partial overlap can be estimated. The model is trained through a synthetic virtual LiDAR dataset through GTA-V with diverse weather and daytime conditions and domain adaptation is implemented to the real-world domain by aligning the global features. Our results outperform state-of-the-art 3D place recognition baselines or achieve comparable on the real-world Oxford RobotCar dataset with the visualization of registration on the virtual dataset.
Unlike its image based counterpart, point cloud based retrieval for place recognition has remained as an unexplored and unsolved problem. This is largely due to the difficulty in extracting local feature descriptors from a point cloud that can subsequently be encoded into a global descriptor for the retrieval task. In this paper, we propose the PointNetVLAD where we leverage on the recent success of deep networks to solve point cloud based retrieval for place recognition. Specifically, our PointNetVLAD is a combination/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud. Furthermore, we propose the lazy triplet and quadruplet loss functions that can achieve more discriminative and generalizable global descriptors to tackle the retrieval task. We create benchmark datasets for point cloud based retrieval for place recognition, and the experimental results on these datasets show the feasibility of our PointNetVLAD. Our code and the link for the benchmark dataset downloads are available in our project website. http://github.com/mikacuy/pointnetvlad/
Point clouds-based Networks have achieved great attention in 3D object classification, segmentation and indoor scene semantic parsing. In terms of face recognition, 3D face recognition method which directly consume point clouds as input is still under study. Two main factors account for this: One is how to get discriminative face representations from 3D point clouds using deep network; the other is the lack of large 3D training dataset. To address these problems, a data-free 3D face recognition method is proposed only using synthesized unreal data from statistical 3D Morphable Model to train a deep point cloud network. To ease the inconsistent distribution between model data and real faces, different point sampling methods are used in train and test phase. In this paper, we propose a curvature-aware point sampling(CPS) strategy replacing the original furthest point sampling(FPS) to hierarchically down-sample feature-sensitive points which are crucial to pass and aggregate features deeply. A PointNet++ like Network is used to extract face features directly from point clouds. The experimental results show that the network trained on generated data generalizes well for real 3D faces. Fine tuning on a small part of FRGCv2.0 and Bosphorus, which include real faces in different poses and expressions, further improves recognition accuracy.
Autonomous Driving and Simultaneous Localization and Mapping(SLAM) are becoming increasingly important in real world, where point cloud-based large scale place recognition is the spike of them. Previous place recognition methods have achieved acceptable performances by regarding the task as a point cloud retrieval problem. However, all of them are suffered from a common defect: they cant handle the situation when the point clouds are rotated, which is common, e.g, when viewpoints or motorcycle types are changed. To tackle this issue, we propose an Attentive Rotation Invariant Convolution (ARIConv) in this paper. The ARIConv adopts three kind of Rotation Invariant Features (RIFs): Spherical Signals (SS), Individual-Local Rotation Invariant Features (ILRIF) and Group-Local Rotation Invariant features (GLRIF) in its structure to learn rotation invariant convolutional kernels, which are robust for learning rotation invariant point cloud features. Whats more, to highlight pivotal RIFs, we inject an attentive module in ARIConv to give different RIFs different importance when learning kernels. Finally, utilizing ARIConv, we build a DenseNet-like network architecture to learn rotation-insensitive global descriptors used for retrieving. We experimentally demonstrate that our model can achieve state-of-the-art performance on large scale place recognition task when the point cloud scans are rotated and can achieve comparable results with most of existing methods on the original non-rotated datasets.
Domain Adaptation (DA) approaches achieved significant improvements in a wide range of machine learning and computer vision tasks (i.e., classification, detection, and segmentation). However, as far as we are aware, there are few methods yet to achieve domain adaptation directly on 3D point cloud data. The unique challenge of point cloud data lies in its abundant spatial geometric information, and the semantics of the whole object is contributed by including regional geometric structures. Specifically, most general-purpose DA methods that struggle for global feature alignment and ignore local geometric information are not suitable for 3D domain alignment. In this paper, we propose a novel 3D Domain Adaptation Network for point cloud data (PointDAN). PointDAN jointly aligns the global and local features in multi-level. For local alignment, we propose Self-Adaptive (SA) node module with an adjusted receptive field to model the discriminative local structures for aligning domains. To represent hierarchically scaled features, node-attention module is further introduced to weight the relationship of SA nodes across objects and domains. For global alignment, an adversarial-training strategy is employed to learn and align global features across domains. Since there is no common evaluation benchmark for 3D point cloud DA scenario, we build a general benchmark (i.e., PointDA-10) extracted from three popular 3D object/scene datasets (i.e., ModelNet, ShapeNet and ScanNet) for cross-domain 3D objects classification fashion. Extensive experiments on PointDA-10 illustrate the superiority of our model over the state-of-the-art general-purpose DA methods.
Extracting robust and general 3D local features is key to downstream tasks such as point cloud registration and reconstruction. Existing learning-based local descriptors are either sensitive to rotation transformations, or rely on classical handcrafted features which are neither general nor representative. In this paper, we introduce a new, yet conceptually simple, neural architecture, termed SpinNet, to extract local features which are rotationally invariant whilst sufficiently informative to enable accurate registration. A Spatial Point Transformer is first introduced to map the input local surface into a carefully designed cylindrical space, enabling end-to-end optimization with SO(2) equivariant representation. A Neural Feature Extractor which leverages the powerful point-based and 3D cylindrical convolutional neural layers is then utilized to derive a compact and representative descriptor for matching. Extensive experiments on both indoor and outdoor datasets demonstrate that SpinNet outperforms existing state-of-the-art techniques by a large margin. More critically, it has the best generalization ability across unseen scenarios with different sensor modalities. The code is available at https://github.com/QingyongHu/SpinNet.