ﻻ يوجد ملخص باللغة العربية
Bounding box regression is an important component in object detection. Recent work has shown the promising performance by optimizing the Intersection over Union (IoU) as loss. However, IoU-based loss has the gradient vanish problem in the case of low overlapping bounding boxes, and the model could easily ignore these simple cases. In this paper, we propose Side Overlap (SO) loss by maximizing the side overlap of two bounding boxes, which puts more penalty for low overlapping bounding box cases. Besides, to speed up the convergence, the Corner Distance (CD) is added into the objective function. Combining the Side Overlap and Corner Distance, we get a new regression objective function, Side and Corner Align Loss (SCALoss). The SCALoss is well-correlated with IoU loss, which also benefits the evaluation metric but produces more penalty for low-overlapping cases. It can serve as a comprehensive similarity measure, leading the better localization performance and faster convergence speed. Experiments on COCO and PASCAL VOC benchmarks show that SCALoss can bring consistent improvement and outperform $ell_n$ loss and IoU based loss with popular object detectors such as YOLOV3, SSD, Reppoints, Faster-RCNN.
Most existing trackers are based on using a classifier and multi-scale estimation to estimate the target state. Consequently, and as expected, trackers have become more stable while tracking accuracy has stagnated. While trackers adopt a maximum over
Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bo
We address a problem of estimating pose of a persons head from its RGB image. The employment of CNNs for the problem has contributed to significant improvement in accuracy in recent works. However, we show that the following two methods, despite thei
Convolutional neural networks (CNNs) have shown great performance as general feature representations for object recognition applications. However, for multi-label images that contain multiple objects from different categories, scales and locations, g
We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep c