ﻻ يوجد ملخص باللغة العربية
The performance of object detection, to a great extent, depends on the availability of large annotated datasets. To alleviate the annotation cost, the research community has explored a number of ways to exploit unlabeled or weakly labeled data. However, such efforts have met with limited success so far. In this work, we revisit the problem with a pragmatic standpoint, trying to explore a new balance between detection performance and annotation cost by jointly exploiting fully and weakly annotated data. Specifically, we propose a weakly- and semi-supervised object detection framework (WSSOD), which involves a two-stage learning procedure. An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images. The underlying assumptions in the current as well as common semi-supervised pipelines are also carefully examined under a unified EM formulation. On top of this framework, weakly-supervised loss (WSL), label attention and random pseudo-label sampling (RPS) strategies are introduced to relax these assumptions, bringing additional improvement on the efficacy of the detection pipeline. The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings, with only one third of the annotations.
Object detection when provided image-level labels instead of instance-level labels (i.e., bounding boxes) during training is an important problem in computer vision, since large scale image datasets with instance-level labels are extremely costly to
We propose a novel point annotated setting for the weakly semi-supervised object detection task, in which the dataset comprises small fully annotated images and large weakly annotated images by points. It achieves a balance between tremendous annotat
Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e., detecting multiple and single instances with bounding boxes in an image using image-level labels, are long-standing and challenging tasks in the CV community. With the success o
Semi-supervised learning, i.e., training networks with both labeled and unlabeled data, has made significant progress recently. However, existing works have primarily focused on image classification tasks and neglected object detection which requires
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain. To relieve the burden of data annotation, we pr