ﻻ يوجد ملخص باللغة العربية
Due to the fact that fully supervised semantic segmentation methods require sufficient fully-labeled data to work well and can not generalize to unseen classes, few-shot segmentation has attracted lots of research attention. Previous arts extract features from support and query images, which are processed jointly before making predictions on query images. The whole process is based on convolutional neural networks (CNN), leading to the problem that only local information is used. In this paper, we propose a TRansformer-based Few-shot Semantic segmentation method (TRFS). Specifically, our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM). GEM adopts transformer blocks to exploit global information, while LEM utilizes conventional convolutions to exploit local information, across query and support features. Both GEM and LEM are complementary, helping to learn better feature representations for segmenting query images. Extensive experiments on PASCAL-5i and COCO datasets show that our approach achieves new state-of-the-art performance, demonstrating its effectiveness.
Few-shot semantic segmentation models aim to segment images after learning from only a few annotated examples. A key challenge for them is overfitting. Prior works usually limit the overall model capacity to alleviate overfitting, but the limited cap
Despite the great progress made by deep CNNs in image semantic segmentation, they typically require a large number of densely-annotated images for training and are difficult to generalize to unseen object categories. Few-shot segmentation has thus be
Despite the great progress made by deep neural networks in the semantic segmentation task, traditional neural-networkbased methods typically suffer from a shortage of large amounts of pixel-level annotations. Recent progress in fewshot semantic segme
This paper aims to address few-shot semantic segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labelled examples. In this work, we propose attentional
Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic a