Do you want to publish a course? Click here

Person-MinkUNet: 3D Person Detection with LiDAR Point Cloud

109   0   0.0 ( 0 )
 Added by Dan Jia
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

In this preliminary work we attempt to apply submanifold sparse convolution to the task of 3D person detection. In particular, we present Person-MinkUNet, a single-stage 3D person detection network based on Minkowski Engine with U-Net architecture. The network achieves a 76.4% average precision (AP) on the JRDB 3D detection benchmark.



rate research

Read More

114 - Liang Peng , Fei Liu , Zhengxu Yu 2021
Monocular 3D detection currently struggles with extremely lower detection rates compared to LiDAR-based methods. The poor accuracy is mainly caused by the absence of accurate location cues due to the ill-posed nature of monocular imagery. LiDAR point clouds, which provide precise spatial measurement, can offer beneficial information for the training of monocular methods. To make use of LiDAR point clouds, prior works project them to form depth map labels, subsequently training a dense depth estimator to extract explicit location features. This indirect and complicated way introduces intermediate products, i.e., depth map predictions, taking much computation costs as well as leading to suboptimal performances. In this paper, we propose LPCG (LiDAR point cloud guided monocular 3D object detection), which is a general framework for guiding the training of monocular 3D detectors with LiDAR point clouds. Specifically, we use LiDAR point clouds to generate pseudo labels, allowing monocular 3D detectors to benefit from easy-collected massive unlabeled data. LPCG works well under both supervised and unsupervised setups. Thanks to a general design, LPCG can be plugged into any monocular 3D detector, significantly boosting the performance. As a result, we take the first place on KITTI monocular 3D/BEV (birds-eye-view) detection benchmark with a considerable margin. The code will be made publicly available soon.
It is laborious to manually label point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under weak supervision, i.e., only the horizontal centers of objects are click-annotated on birds view scenes. Stage-2 learns to refine the cylindrical proposals to get cuboids and confidence scores, using a few well-labeled object instances. Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 85-95% the performance of current top-leading, fully supervised detectors (which require 3, 712 exhaustively and precisely annotated scenes with 15, 654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, allowing both automatic and active working modes. The annotations generated by our model can be used to train 3D object detectors with over 94% of their original performance (under manually labeled data). Our experiments also show our models potential in boosting performance given more training data. Above designs make our approach highly practical and introduce new opportunities for learning 3D object detection with reduced annotation burden.
176 - Yiming Zhao , Lin Bai , 2021
Projecting the point cloud on the 2D spherical range image transforms the LiDAR semantic segmentation to a 2D segmentation task on the range image. However, the LiDAR range image is still naturally different from the regular 2D RGB image; for example, each position on the range image encodes the unique geometry information. In this paper, we propose a new projection-based LiDAR semantic segmentation pipeline that consists of a novel network structure and an efficient post-processing step. In our network structure, we design a FID (fully interpolation decoding) module that directly upsamples the multi-resolution feature maps using bilinear interpolation. Inspired by the 3D distance interpolation used in PointNet++, we argue this FID module is a 2D version distance interpolation on $(theta, phi)$ space. As a parameter-free decoding module, the FID largely reduces the model complexity by maintaining good performance. Besides the network structure, we empirically find that our model predictions have clear boundaries between different semantic classes. This makes us rethink whether the widely used K-nearest-neighbor post-processing is still necessary for our pipeline. Then, we realize the many-to-one mapping causes the blurring effect that some points are mapped into the same pixel and share the same label. Therefore, we propose to process those occluded points by assigning the nearest predicted label to them. This NLA (nearest label assignment) post-processing step shows a better performance than KNN with faster inference speed in the ablation study. On the SemanticKITTI dataset, our pipeline achieves the best performance among all projection-based methods with $64 times 2048$ resolution and all point-wise solutions. With a ResNet-34 as the backbone, both the training and testing of our model can be finished on a single RTX 2080 Ti with 11G memory. The code is released.
Lidar sensors are frequently used in environment perception for autonomous vehicles and mobile robotics to complement camera, radar, and ultrasonic sensors. Adverse weather conditions are significantly impacting the performance of lidar-based scene understanding by causing undesired measurement points that in turn effect missing detections and false positives. In heavy rain or dense fog, water drops could be misinterpreted as objects in front of the vehicle which brings a mobile robot to a full stop. In this paper, we present the first CNN-based approach to understand and filter out such adverse weather effects in point cloud data. Using a large data set obtained in controlled weather environments, we demonstrate a significant performance improvement of our method over state-of-the-art involving geometric filtering. Data is available at https://github.com/rheinzler/PointCloudDeNoising.
Deep learning is the essential building block of state-of-the-art person detectors in 2D range data. However, only a few annotated datasets are available for training and testing these deep networks, potentially limiting their performance when deployed in new environments or with different LiDAR models. We propose a method, which uses bounding boxes from an image-based detector (e.g. Faster R-CNN) on a calibrated camera to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors. Through experiments on the JackRabbot dataset with two detector models, DROW3 and DR-SPAAM, we show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained only on a different dataset. Combined with robust training techniques, the self-supervised detectors reach a performance close to the ones trained using manual annotations of the target dataset. Our method is an effective way to improve person detectors during deployment without any additional labeling effort, and we release our source code to support relevant robotic applications.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا