ترغب بنشر مسار تعليمي؟ اضغط هنا

Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond

122   0   0.0 ( 0 )
 نشر من قبل Krishna Kumar Singh
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose Hide-and-Seek a general purpose data augmentation technique, which is complementary to existing data augmentation techniques and is beneficial for various visual recognition tasks. The key idea is to hide patches in a training image randomly, in order to force the network to seek other relevant content when the most discriminative content is hidden. Our approach only needs to modify the input image and can work with any network to improve its performance. During testing, it does not need to hide any patches. The main advantage of Hide-and-Seek over existing data augmentation techniques is its ability to improve object localization accuracy in the weakly-supervised setting, and we therefore use this task to motivate the approach. However, Hide-and-Seek is not tied only to the image localization task, and can generalize to other forms of visual input like videos, as well as other recognition tasks like image classification, temporal action localization, semantic segmentation, emotion recognition, age/gender estimation, and person re-identification. We perform extensive experiments to showcase the advantage of Hide-and-Seek on these various visual recognition problems.

قيم البحث

اقرأ أيضاً

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator. We place a variety of obstacles in the environment for the prey to hide behind, and we only give th e agents partial observations of their environment using an egocentric perspective. Although we train the model to play this game from scratch, experiments and visualizations suggest that the agent learns to predict its own visibility in the environment. Furthermore, we quantitatively analyze how agent weaknesses, such as slower speed, effect the learned policy. Our results suggest that, although agent weaknesses make the learning problem more challenging, they also cause more useful features to be learned. Our project website is available at: http://www.cs.columbia.edu/ ~bchen/visualhideseek/.
115 - K. Sneppen 2004
Signaling pathways and networks determine the ability to communicate in systems ranging from living cells to human society. We investigate how the network structure constrains communication in social-, man-made and biological networks. We find that h uman networks of governance and collaboration are predictable on teat-a-teat level, reflecting well defined pathways, but globally inefficient. In contrast, the Internet tends to have better overall communication abilities, more alternative pathways, and is therefore more robust. Between these extremes the molecular network of Saccharomyces cerevisea is more similar to the simpler social systems, whereas the pattern of interactions in the more complex Drosophilia melanogaster, resembles the robust Internet.
Data augmentation is vital for deep learning neural networks. By providing massive training samples, it helps to improve the generalization ability of the model. Weakly supervised semantic segmentation (WSSS) is a challenging problem that has been de eply studied in recent years, conventional data augmentation approaches for WSSS usually employ geometrical transformations, random cropping and color jittering. However, merely increasing the same contextual semantic data does not bring much gain to the networks to distinguish the objects, e.g., the correct image-level classification of aeroplane may be not only due to the recognition of the object itself, but also its co-occurrence context like sky, which will cause the model to focus less on the object features. To this end, we present a Context Decoupling Augmentation (CDA) method, to change the inherent context in which the objects appear and thus drive the network to remove the dependence between object instances and contextual information. To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e., detecting multiple and single instances with bounding boxes in an image using image-level labels, are long-standing and challenging tasks in the CV community. With the success o f deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention. Hundreds of WSOD and WSOL methods and numerous techniques have been proposed in the deep learning era. To this end, in this paper, we consider WSOL is a sub-task of WSOD and provide a comprehensive survey of the recent achievements of WSOD. Specifically, we firstly describe the formulation and setting of the WSOD, including the background, challenges, basic framework. Meanwhile, we summarize and analyze all advanced techniques and training tricks for improving detection performance. Then, we introduce the widely-used datasets and evaluation metrics of WSOD. Lastly, we discuss the future directions of WSOD. We believe that these summaries can help pave a way for future research on WSOD and WSOL.
Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision for validating hyperparameters and model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا