ترغب بنشر مسار تعليمي؟ اضغط هنا

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

76   0   0.0 ( 0 )
 نشر من قبل Rui Huang
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Detecting objects in 3D LiDAR data is a core technology for autonomous driving and other robotics applications. Although LiDAR data is acquired over time, most of the 3D object detection algorithms propose object bounding boxes independently for each frame and neglect the useful information available in the temporal domain. To address this problem, in this paper we propose a sparse LSTM-based multi-frame 3d object detection algorithm. We use a U-Net style 3D sparse convolution network to extract features for each frames LiDAR point-cloud. These features are fed to the LSTM module together with the hidden and memory features from last frame to predict the 3d objects in the current frame as well as hidden and memory features that are passed to the next frame. Experiments on the Waymo Open Dataset show that our algorithm outperforms the traditional frame by frame approach by 7.5% [email protected] and other multi-frame approaches by 1.2% while using less memory and computation per frame. To the best of our knowledge, this is the first work to use an LSTM for 3D object detection in sparse point clouds.



قيم البحث

اقرأ أيضاً

LiDAR point clouds contain measurements of complicated natural scenes and can be used to update digital elevation models, glacial monitoring, detecting faults and measuring uplift detecting, forest inventory, detect shoreline and beach volume changes , landslide risk analysis, habitat mapping, and urban development, among others. A very important application is the classification of the 3D cloud into elementary classes. For example, it can be used to differentiate between vegetation, man-made structures, and water. Our goal is to present a preliminary comparison study for the classification of 3D point cloud LiDAR data that includes several types of feature engineering. In particular, we demonstrate that providing context by augmenting each point in the LiDAR point cloud with information about its neighboring points can improve the performance of downstream learning algorithms. We also experiment with several dimension reduction strategies, ranging from Principal Component Analysis (PCA) to neural network-based auto-encoders, and demonstrate how they affect classification performance in LiDAR point clouds. For instance, we observe that combining feature engineering with a dimension reduction a method such as PCA, there is an improvement in the accuracy of the classification with respect to doing a straightforward classification with the raw data.
114 - Liang Peng , Fei Liu , Zhengxu Yu 2021
Monocular 3D detection currently struggles with extremely lower detection rates compared to LiDAR-based methods. The poor accuracy is mainly caused by the absence of accurate location cues due to the ill-posed nature of monocular imagery. LiDAR point clouds, which provide precise spatial measurement, can offer beneficial information for the training of monocular methods. To make use of LiDAR point clouds, prior works project them to form depth map labels, subsequently training a dense depth estimator to extract explicit location features. This indirect and complicated way introduces intermediate products, i.e., depth map predictions, taking much computation costs as well as leading to suboptimal performances. In this paper, we propose LPCG (LiDAR point cloud guided monocular 3D object detection), which is a general framework for guiding the training of monocular 3D detectors with LiDAR point clouds. Specifically, we use LiDAR point clouds to generate pseudo labels, allowing monocular 3D detectors to benefit from easy-collected massive unlabeled data. LPCG works well under both supervised and unsupervised setups. Thanks to a general design, LPCG can be plugged into any monocular 3D detector, significantly boosting the performance. As a result, we take the first place on KITTI monocular 3D/BEV (birds-eye-view) detection benchmark with a considerable margin. The code will be made publicly available soon.
It is laborious to manually label point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few pr ecisely labeled object instances. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under weak supervision, i.e., only the horizontal centers of objects are click-annotated on birds view scenes. Stage-2 learns to refine the cylindrical proposals to get cuboids and confidence scores, using a few well-labeled object instances. Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 85-95% the performance of current top-leading, fully supervised detectors (which require 3, 712 exhaustively and precisely annotated scenes with 15, 654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, allowing both automatic and active working modes. The annotations generated by our model can be used to train 3D object detectors with over 94% of their original performance (under manually labeled data). Our experiments also show our models potential in boosting performance given more training data. Above designs make our approach highly practical and introduce new opportunities for learning 3D object detection with reduced annotation burden.
104 - Rui Qian , Xin Lai , Xirong Li 2021
Currently, existing state-of-the-art 3D object detectors are in two-stage paradigm. These methods typically comprise two steps: 1) Utilize region proposal network to propose a fraction of high-quality proposals in a bottom-up fashion. 2) Resize and p ool the semantic features from the proposed regions to summarize RoI-wise representations for further refinement. Note that these RoI-wise representations in step 2) are considered individually as an uncorrelated entry when fed to following detection headers. Nevertheless, we observe these proposals generated by step 1) offset from ground truth somehow, emerging in local neighborhood densely with an underlying probability. Challenges arise in the case where a proposal largely forsakes its boundary information due to coordinate offset while existing networks lack corresponding information compensation mechanism. In this paper, we propose BANet for 3D object detection from point clouds. Specifically, instead of refining each proposal independently as previous works do, we represent each proposal as a node for graph construction within a given cut-off threshold, associating proposals in the form of local neighborhood graph, with boundary correlations of an object being explicitly exploited. Besides, we devise a lightweight Region Feature Aggregation Network to fully exploit voxel-wise, pixel-wise, and point-wise feature with expanding receptive fields for more informative RoI-wise representations. As of Apr. 17th, 2021, our BANet achieves on par performance on KITTI 3D detection leaderboard and ranks $1^{st}$ on $Moderate$ difficulty of $Car$ category on KITTI BEV detection leaderboard. The source code will be released once the paper is accepted.
137 - Zengyi Qin , Jinglu Wang , Yan Lu 2020
A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes. Existing 3D object detectors heavily rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios. Weakly supervised learning is a promising approach to reducing the annotation requirement, but existing weakly supervised object detectors are mostly for 2D detection rather than 3D. In this work, we propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training. First, we introduce an unsupervised 3D proposal module that generates object proposals by leveraging normalized point cloud densities. Second, we present a cross-modal knowledge distillation strategy, where a convolutional neural network learns to predict the final results from the 3D object proposals by querying a teacher network pretrained on image datasets. Comprehensive experiments on the challenging KITTI dataset demonstrate the superior performance of our VS3D in diverse evaluation settings. The source code and pretrained models are publicly available at https://github.com/Zengyi-Qin/Weakly-Supervised-3D-Object-Detection.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا