ﻻ يوجد ملخص باللغة العربية
Object detection when provided image-level labels instead of instance-level labels (i.e., bounding boxes) during training is an important problem in computer vision, since large scale image datasets with instance-level labels are extremely costly to obtain. In this paper, we address this challenging problem by developing an Expectation-Maximization (EM) based object detection method using deep convolutional neural networks (CNNs). Our method is applicable to both the weakly-supervised and semi-supervised settings. Extensive experiments on PASCAL VOC 2007 benchmark show that (1) in the weakly supervised setting, our method provides significant detection performance improvement over current state-of-the-art methods, (2) having access to a small number of strongly (instance-level) annotated images, our method can almost match the performace of the fully supervised Fast RCNN. We share our source code at https://github.com/ZiangYan/EM-WSD.
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains mult
The performance of object detection, to a great extent, depends on the availability of large annotated datasets. To alleviate the annotation cost, the research community has explored a number of ways to exploit unlabeled or weakly labeled data. Howev
We propose a novel point annotated setting for the weakly semi-supervised object detection task, in which the dataset comprises small fully annotated images and large weakly annotated images by points. It achieves a balance between tremendous annotat
A large gap exists between fully-supervised object detection and weakly-supervised object detection. To narrow this gap, some methods consider knowledge transfer from additional fully-supervised dataset. But these methods do not fully exploit discrim
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain. To relieve the burden of data annotation, we pr