ﻻ يوجد ملخص باللغة العربية
We present a simple yet effective prediction module for a one-stage detector. The main process is conducted in a coarse-to-fine manner. First, the module roughly adjusts the default boxes to well capture the extent of target objects in an image. Second, given the adjusted boxes, the module aligns the receptive field of the convolution filters accordingly, not requiring any embedding layers. Both steps build a propose-and-attend mechanism, mimicking two-stage detectors in a highly efficient manner. To verify its effectiveness, we apply the proposed module to a basic one-stage detector SSD. Our final model achieves an accuracy comparable to that of state-of-the-art detectors while using a fraction of their model parameters and computational overheads. Moreover, we found that the proposed module has two strong applications. 1) The module can be successfully integrated into a lightweight backbone, further pushing the efficiency of the one-stage detector. 2) The module also allows train-from-scratch without relying on any sophisticated base networks as previous methods do.
Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos. Nevertheless, the extension of such object detectors from image to video is not trivial especially when a
We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially
We propose a robust solution to future trajectory forecast, which can be practically applicable to autonomous agents in highly crowded environments. For this, three aspects are particularly addressed in this paper. First, we use composite fields to p
Fake face detection is a significant challenge for intelligent systems as generative models become more powerful every single day. As the quality of fake faces increases, the trained models become more and more inefficient to detect the novel fake fa
For most of the object detectors based on multi-scale feature maps, the shallow layers are rich in fine spatial information and thus mainly responsible for small object detection. The performance of small object detection, however, is still less than