Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution

71 0 0.0 ( 0 )

Download Cite

Added by Yang You

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Yang You - Yujing Lou - Qi Liu

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Pointwise Rotation-Invariant Network, focusing on rotation-invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. In addition, we propose Spherical Voxel Convolution and Point Re-sampling to extract rotation-invariant features for each point. Our network can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. We show that, on the dataset with randomly rotated point clouds, PRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide theoretical analysis for the rotation-invariance achieved by our methods.

rate research

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

177 - Haotian Tang , Zhijian Liu , Shengyu Zhao 2020

Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.

Computer Vision and Pattern Recognition

Effective Rotation-invariant Point CNN with Spherical Harmonics kernels

62 - Adrien Poulenard , Marie-Julie Rakotosaona , Yann Ponty 2019

We present a novel rotation invariant architecture operating directly on point cloud data. We demonstrate how rotation invariance can be injected into a recently proposed point-based PCNN architecture, at all layers of the network, achieving invariance to both global shape transformations, and to local rotations on the level of patches or parts, useful when dealing with non-rigid objects. We achieve this by employing a spherical harmonics based kernel at different layers of the network, which is guaranteed to be invariant to rigid motions. We also introduce a more efficient pooling operation for PCNN using space-partitioning data-structures. This results in a flexible, simple and efficient architecture that achieves accurate results on challenging shape analysis tasks including classification and segmentation, without requiring data-augmentation, typically employed by non-invariant approaches.

Computer Vision and Pattern Recognition

MG-SAGC: A multiscale graph and its self-adaptive graph convolution network for 3D point clouds

131 - Bo Wu , Bo Lang 2020

To enhance the ability of neural networks to extract local point cloud features and improve their quality, in this paper, we propose a multiscale graph generation method and a self-adaptive graph convolution method. First, we propose a multiscale graph generation method for point clouds. This approach transforms point clouds into a structured multiscale graph form that supports multiscale analysis of point clouds in the scale space and can obtain the dimensional features of point cloud data at different scales, thus making it easier to obtain the best point cloud features. Because traditional convolutional neural networks are not applicable to graph data with irregular vertex neighborhoods, this paper presents an sef-adaptive graph convolution kernel that uses the Chebyshev polynomial to fit an irregular convolution filter based on the theory of optimal approximation. In this paper, we adopt max pooling to synthesize the features of different scale maps and generate the point cloud features. In experiments conducted on three widely used public datasets, the proposed method significantly outperforms other state-of-the-art models, demonstrating its effectiveness and generalizability.

Computer Vision and Pattern Recognition

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

120 - Zeyu Hu , Xuyang Bai , Jiaxiang Shang 2021

In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters). Code release: https://github.com/hzykent/VMNet

Computer Vision and Pattern Recognition

Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition

87 - Zhaoxin Fan , Zhenbo Song , Wenping Zhang 2021

Autonomous Driving and Simultaneous Localization and Mapping(SLAM) are becoming increasingly important in real world, where point cloud-based large scale place recognition is the spike of them. Previous place recognition methods have achieved acceptable performances by regarding the task as a point cloud retrieval problem. However, all of them are suffered from a common defect: they cant handle the situation when the point clouds are rotated, which is common, e.g, when viewpoints or motorcycle types are changed. To tackle this issue, we propose an Attentive Rotation Invariant Convolution (ARIConv) in this paper. The ARIConv adopts three kind of Rotation Invariant Features (RIFs): Spherical Signals (SS), Individual-Local Rotation Invariant Features (ILRIF) and Group-Local Rotation Invariant features (GLRIF) in its structure to learn rotation invariant convolutional kernels, which are robust for learning rotation invariant point cloud features. Whats more, to highlight pivotal RIFs, we inject an attentive module in ARIConv to give different RIFs different importance when learning kernels. Finally, utilizing ARIConv, we build a DenseNet-like network architecture to learn rotation-insensitive global descriptors used for retrieving. We experimentally demonstrate that our model can achieve state-of-the-art performance on large scale place recognition task when the point cloud scans are rotated and can achieve comparable results with most of existing methods on the original non-rotated datasets.

Computer Vision and Pattern Recognition