ﻻ يوجد ملخص باللغة العربية
Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($bf O^2DETR$) based on an end-to-end network. The contributions of $rm O^2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $rm O^2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $rm O^2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.
The transformer networks are particularly good at modeling long-range dependencies within a long sequence. In this paper, we conduct research on applying the transformer networks for salient object detection (SOD). We adopt the dense transformer back
Though 3D object detection from point clouds has achieved rapid progress in recent years, the lack of flexible and high-performance proposal refinement remains a great hurdle for existing state-of-the-art two-stage detectors. Previous works on refini
In contrast to the oriented bounding boxes, point set representation has great potential to capture the detailed structure of instances with the arbitrary orientations, large aspect ratios and dense distribution in aerial images. However, the convent
Current state-of-the-art two-stage detectors generate oriented proposals through time-consuming schemes. This diminishes the detectors speed, thereby becoming the computational bottleneck in advanced oriented object detection systems. This work propo
We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds. Conventional 3D convolutional backbones in voxel-based 3D detectors cannot efficiently capture large context inform