ﻻ يوجد ملخص باللغة العربية
Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to their different characteristics and the interference from the non-interest areas. To solve this problem, we propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. The proposed detector has two stages. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion (AAF) modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement. A novel attention-based hybrid sampling strategy is also proposed for selecting key points in the downsampling process. We evaluate our approach on two widely used benchmark datasets including KITTI and SUN-RGBD. The experimental results demonstrate the advantages of our method over state-of-the-art approaches.
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this i
3D object detection is an important module in autonomous driving and robotics. However, many existing methods focus on using single frames to perform 3D detection, and do not fully utilize information from multiple frames. In this paper, we present 3
Salient object detection(SOD) aims at locating the most significant object within a given image. In recent years, great progress has been made in applying SOD on many vision tasks. The depth map could provide additional spatial prior and boundary cue
In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively. Different from most existing monocular frameworks which use tigh
Video object detection is a tough task due to the deteriorated quality of video sequences captured under complex environments. Currently, this area is dominated by a series of feature enhancement based methods, which distill beneficial semantic infor