ﻻ يوجد ملخص باللغة العربية
The research on recognizing the most discriminative regions provides referential information for weakly supervised object localization with only image-level annotations. However, the most discriminative regions usually conceal the other parts of the object, thereby impeding entire object recognition and localization. To tackle this problem, the Dual-attention Focused Module (DFM) is proposed to enhance object localization performance. Specifically, we present a dual attention module for information fusion, consisting of a position branch and a channel one. In each branch, the input feature map is deduced into an enhancement map and a mask map, thereby highlighting the most discriminative parts or hiding them. For the position mask map, we introduce a focused matrix to enhance it, which utilizes the principle that the pixels of an object are continuous. Between these two branches, the enhancement map is integrated with the mask map, aiming at partially compensating the lost information and diversifies the features. With the dual-attention module and focused matrix, the entire object region could be precisely recognized with implicit information. We demonstrate outperforming results of DFM in experiments. In particular, DFM achieves state-of-the-art performance in localization accuracy in ILSVRC 2016 and CUB-200-2011.
Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we
Weakly Supervised Object Localization (WSOL) techniques learn the object location only using image-level labels, without location annotations. A common limitation for these techniques is that they cover only the most discriminative part of the object
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discri
Weakly supervised semantic segmentation and localiza- tion have a problem of focusing only on the most important parts of an image since they use only image-level annota- tions. In this paper, we solve this problem fundamentally via two-phase learnin
Weakly supervised object localization (WSOL) aims to localize objects by only utilizing image-level labels. Class activation maps (CAMs) are the commonly used features to achieve WSOL. However, previous CAM-based methods did not take full advantage o