ﻻ يوجد ملخص باللغة العربية
Video object detection (VID) has been vigorously studied for years but almost all literature adopts a static accuracy-based evaluation, i.e., average precision (AP). From a robotic perspective, the importance of recall continuity and localization stability is equal to that of accuracy, but the AP is insufficient to reflect detectors performance across time. In this paper, non-reference assessments are proposed for continuity and stability based on object tracklets. These temporal evaluations can serve as supplements to static AP. Further, we develop an online tracklet refinement for improving detectors temporal performance through short tracklet suppression, fragment filling, and temporal location fusion. In addition, we propose a small-overlap suppression to extend VID methods to single object tracking (SOT) task so that a flexible SOT-by-detection framework is then formed. Extensive experiments are conducted on ImageNet VID dataset and real-world robotic tasks, where the superiority of our proposed approaches are validated and verified. Codes will be publicly available.
The convention standard for object detection uses a bounding box to represent each individual object instance. However, it is not practical in the industry-relevant applications in the context of warehouses due to severe occlusions among groups of in
Many modern robotics systems employ LiDAR as their main sensing modality due to its geometrical richness. Rolling shutter LiDARs are particularly common, in which an array of lasers scans the scene from a rotating base. Points are emitted as a stream
Object classification with 3D data is an essential component of any scene understanding method. It has gained significant interest in a variety of communities, most notably in robotics and computer graphics. While the advent of deep learning has prog
DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge. In this paper, we investigate the causes of the
Recent advances in semi-supervised object detection (SSOD) are largely driven by consistency-based pseudo-labeling methods for image classification tasks, producing pseudo labels as supervisory signals. However, when using pseudo labels, there is a l