No Arabic abstract
To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output of a convolutional neural network (CNN) is used as classifier to obtain the labels of the environment. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps: initially, we use heuristic methods to associate probabilities to all the semantic classes contained in the labelled images. Then, the lidar points are corrected to compensate for the vehicles motion given the difference between the timestamps of each lidar scan and camera image. In a third step, we calculate the pixel coordinate for the corresponding camera image. In the last step we perform the transfer of semantic information from the heuristic probability images to the lidar frame, while removing the lidar information that is not visible to the camera. We tested our approach in the Usyd Dataset cite{usyd_dataset}, obtaining qualitative and quantitative results that demonstrate the validity of our probabilistic sensory fusion approach.
We propose an algorithm for automatic, targetless, extrinsic calibration of a LiDAR and camera system using semantic information. We achieve this goal by maximizing mutual information (MI) of semantic information between sensors, leveraging a neural network to estimate semantic mutual information, and matrix exponential for calibration computation. Using kernel-based sampling to sample data from camera measurement based on LiDAR projected points, we formulate the problem as a novel differentiable objective function which supports the use of gradient-based optimization methods. We also introduce an initial calibration method using 2D MI-based image registration. Finally, we demonstrate the robustness of our method and quantitatively analyze the accuracy on a synthetic dataset and also evaluate our algorithm qualitatively on KITTI360 and RELLIS-3D benchmark datasets, showing improvement over recent comparable approaches.
Camera and 3D LiDAR sensors have become indispensable devices in modern autonomous driving vehicles, where the camera provides the fine-grained texture, color information in 2D space and LiDAR captures more precise and farther-away distance measurements of the surrounding environments. The complementary information from these two sensors makes the two-modality fusion be a desired option. However, two major issues of the fusion between camera and LiDAR hinder its performance, ie, how to effectively fuse these two modalities and how to precisely align them (suffering from the weak spatiotemporal synchronization problem). In this paper, we propose a coarse-to-fine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation. For the first issue, unlike these previous works fusing the point cloud and image information in a one-to-one manner, the proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy. Second, due to the weak spatiotemporal synchronization problem, an offset rectification approach is designed to align these two-modality features. The cooperation of these two components leads to the success of the effective camera-LiDAR fusion. Experimental results on the nuScenes dataset show the superiority of the proposed LIF-Seg over existing methods with a large margin. Ablation studies and analyses demonstrate that our proposed LIF-Seg can effectively tackle the weak spatiotemporal synchronization problem.
Selecting safe landing sites in non-cooperative environments is a key step towards the full autonomy of UAVs. However, the existing methods have the common problems of poor generalization ability and robustness. Their performance in unknown environments is significantly degraded and the error cannot be self-detected and corrected. In this paper, we construct a UAV system equipped with low-cost LiDAR and binocular cameras to realize autonomous landing in non-cooperative environments by detecting the flat and safe ground area. Taking advantage of the non-repetitive scanning and high FOV coverage characteristics of LiDAR, we come up with a dynamic time depth completion algorithm. In conjunction with the proposed self-evaluation method of the depth map, our model can dynamically select the LiDAR accumulation time at the inference phase to ensure an accurate prediction result. Based on the depth map, the high-level terrain information such as slope, roughness, and the size of the safe area are derived. We have conducted extensive autonomous landing experiments in a variety of familiar or completely unknown environments, verifying that our model can adaptively balance the accuracy and speed, and the UAV can robustly select a safe landing site.
In autonomous driving, using a variety of sensors to recognize preceding vehicles in middle and long distance is helpful for improving driving performance and developing various functions. However, if only LiDAR or camera is used in the recognition stage, it is difficult to obtain necessary data due to the limitations of each sensor. In this paper, we proposed a method of converting the tracking data of vision into birds eye view (BEV) coordinates using an equation that projects LiDAR points onto an image, and a method of fusion between LiDAR and vision tracked data. Thus, the newly proposed method was effective through the results of detecting closest in-path vehicle (CIPV) in various situations. In addition, even when experimenting with the EuroNCAP autonomous emergency braking (AEB) test protocol using the result of fusion, AEB performance is improved through improved cognitive performance than when using only LiDAR. In experimental results, the performance of the proposed method was proved through actual vehicle tests in various scenarios. Consequently, it is convincing that the newly proposed sensor fusion method significantly improves the ACC function in autonomous maneuvering. We expect that this improvement in perception performance will contribute to improving the overall stability of ACC.
3D LiDAR (light detection and ranging) semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics. For example, for autonomous cars equipped with RGB cameras and LiDAR, it is crucial to fuse complementary information from different sensors for robust and accurate segmentation. Existing fusion-based methods, however, may not achieve promising performance due to the vast difference between the two modalities. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities, namely, appearance information from RGB images and spatio-depth information from point clouds. To this end, we first project point clouds to the camera coordinates to provide spatio-depth information for RGB images. Then, we propose a two-stream network to extract features from the two modalities, separately, and fuse the features by effective residual-based fusion modules. Moreover, we propose additional perception-aware losses to measure the perceptual difference between the two modalities. Extensive experiments on two benchmark data sets show the superiority of our method. For example, on nuScenes, our PMF outperforms the state-of-the-art method by 0.8 in mIoU.