ﻻ يوجد ملخص باللغة العربية
We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds. Current approaches generally rely on the points or voxels of interest for RoI feature extraction on the second stage, but cannot effectively handle the sparsity and non-uniform distribution of those points, and this may result in failures in detecting objects that are far away. To resolve the problems, we propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest. The pyramid RoI head consists of three key components. Firstly, we propose the RoI-grid Pyramid, which mitigates the sparsity problem by extensively collecting points of interest for each RoI in a pyramid manner. Secondly, we propose RoI-grid Attention, a new operation that can encode richer information from sparse points by incorporating conventional attention-based and graph-based point operators into a unified formulation. Thirdly, we propose the Density-Aware Radius Prediction (DARP) module, which can adapt to different point density levels by dynamically adjusting the focusing range of RoIs. Combining the three components, our pyramid RoI head is robust to the sparse and imbalanced circumstances, and can be applied upon various 3D backbones to consistently boost the detection performance. Extensive experiments show that Pyramid R-CNN outperforms the state-of-the-art 3D detection models by a large margin on both the KITTI dataset and the Waymo Open dataset.
Recent advances on 3D object detection heavily rely on how the 3D data are represented, emph{i.e.}, voxel-based or point-based representation. Many existing high performance 3D detectors are point-based because this structure can better retain precis
Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully revisit the standard training practice of detectors, an
Current state-of-the-art two-stage detectors generate oriented proposals through time-consuming schemes. This diminishes the detectors speed, thereby becoming the computational bottleneck in advanced oriented object detection systems. This work propo
As an emerging data modal with precise distance sensing, LiDAR point clouds have been placed great expectations on 3D scene understanding. However, point clouds are always sparsely distributed in the 3D space, and with unstructured storage, which mak
The continual learning problem has been widely studied in image classification, while rare work has been explored in object detection. Some recent works apply knowledge distillation to constrain the model to retain old knowledge, but this rigid const