ترغب بنشر مسار تعليمي؟ اضغط هنا

Boosting Few-shot Semantic Segmentation with Transformers

146   0   0.0 ( 0 )
 نشر من قبل Guolei Sun
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Due to the fact that fully supervised semantic segmentation methods require sufficient fully-labeled data to work well and can not generalize to unseen classes, few-shot segmentation has attracted lots of research attention. Previous arts extract features from support and query images, which are processed jointly before making predictions on query images. The whole process is based on convolutional neural networks (CNN), leading to the problem that only local information is used. In this paper, we propose a TRansformer-based Few-shot Semantic segmentation method (TRFS). Specifically, our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM). GEM adopts transformer blocks to exploit global information, while LEM utilizes conventional convolutions to exploit local information, across query and support features. Both GEM and LEM are complementary, helping to learn better feature representations for segmenting query images. Extensive experiments on PASCAL-5i and COCO datasets show that our approach achieves new state-of-the-art performance, demonstrating its effectiveness.



قيم البحث

اقرأ أيضاً

Few-shot semantic segmentation models aim to segment images after learning from only a few annotated examples. A key challenge for them is overfitting. Prior works usually limit the overall model capacity to alleviate overfitting, but the limited cap acity also hampers the segmentation accuracy. We instead propose a method that increases the overall model capacity by supplementing class-specific features with objectness, which is class-agnostic and so not prone to overfitting. Extensive experiments demonstrate the versatility of our method with multiple backbone models (ResNet-50, ResNet-101 and HRNetV2-W48) and existing base architectures (DENet and PFENet). Given only one annotated example of an unseen category, experiments show that our method outperforms state-of-art methods with respect to mIoU by at least 4.7% and 1.5% on PASCAL-5i and COCO-20i respectively.
Despite the great progress made by deep CNNs in image semantic segmentation, they typically require a large number of densely-annotated images for training and are difficult to generalize to unseen object categories. Few-shot segmentation has thus be en developed to learn to perform segmentation from only a few annotated examples. In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set. Our PANet learns class-specific prototype representations from a few support images within an embedding space and then performs segmentation over the query images through matching each pixel to the learned prototypes. With non-parametric metric learning, PANet offers high-quality prototypes that are representative for each semantic class and meanwhile discriminative for different classes. Moreover, PANet introduces a prototype alignment regularization between support and query. With this, PANet fully exploits knowledge from the support and provides better generalization on few-shot segmentation. Significantly, our model achieves the mIoU score of 48.1% and 55.7% on PASCAL-5i for 1-shot and 5-shot settings respectively, surpassing the state-of-the-art method by 1.8% and 8.6%.
Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural-networkbased methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in fewshot semantic segme ntation tackles the issue by only a few pixel-level annotated examples. However, these few-shot approaches cannot easily be applied to multi-way or weak annotation settings. In this paper, we advance the few-shot segmentation paradigm towards a scenario where image-level annotations are available to help the training process of a few pixel-level annotations. Our key idea is to learn a better prototype representation of the class by fusing the knowledge from the image-level labeled data. Specifically, we propose a new framework, called PAIA, to learn the class prototype representation in a metric space by integrating image-level annotations. Furthermore, by considering the uncertainty of pseudo-masks, a distilled soft masked average pooling strategy is designed to handle distractions in image-level annotations. Extensive empirical results on two datasets show superior performance of PAIA.
This paper aims to address few-shot semantic segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labelled examples. In this work, we propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot semantic segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. The probabilistic modeling of the prototype enhances the models generalization ability by handling the inherent uncertainty caused by limited data and intra-class variations of objects. To further enhance the model, we introduce a local latent variable to represent the attention map of each query image, which enables the model to attend to foreground objects while suppressing background. The optimization of the proposed model is formulated as a variational Bayesian inference problem, which is established by amortized inference networks.We conduct extensive experiments on three benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art methods. We also provide comprehensive analyses and ablation studies to gain insight into the effectiveness of our method for few-shot semantic segmentation.
95 - Boyu Yang , Chang Liu , Bohao Li 2020
Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic a mbiguity. In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. Estimated by an Expectation-Maximization algorithm, PMMs incorporate rich channel-wised and spatial semantics from limited support images. Utilized as representations as well as classifiers, PMMs fully leverage the semantics to activate objects in the query image while depressing background regions in a duplex manner. Extensive experiments on Pascal VOC and MS-COCO datasets show that PMMs significantly improve upon state-of-the-arts. Particularly, PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82% with only a moderate cost for model size and inference speed.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا