ﻻ يوجد ملخص باللغة العربية
For reliable environment perception, the use of temporal information is essential in some situations. Especially for object detection, sometimes a situation can only be understood in the right perspective through temporal information. Since image-based object detectors are currently based almost exclusively on CNN architectures, an extension of their feature extraction with temporal features seems promising. Within this work we investigate different architectural components for a CNN-based temporal information extraction. We present a Temporal Feature Network which is based on the insights gained from our architectural investigations. This network is trained from scratch without any ImageNet information based pre-training as these images are not available with temporal information. The object detector based on this network is evaluated against the non-temporal counterpart as baseline and achieves competitive results in an evaluation on the KITTI object detection dataset.
Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions is to exploit the temporal information and enhance per-
Object detection is one of the most active areas in computer vision, which has made significant improvement in recent years. Current state-of-the-art object detection methods mostly adhere to the framework of regions with convolutional neural network
Current state-of-the-art two-stage detectors generate oriented proposals through time-consuming schemes. This diminishes the detectors speed, thereby becoming the computational bottleneck in advanced oriented object detection systems. This work propo
Recent advances on 3D object detection heavily rely on how the 3D data are represented, emph{i.e.}, voxel-based or point-based representation. Many existing high performance 3D detectors are point-based because this structure can better retain precis
Dense object detectors rely on the sliding-window paradigm that predicts the object over a regular grid of image. Meanwhile, the feature maps on the point of the grid are adopted to generate the bounding box predictions. The point feature is convenie