Contrastive Learning for Label-Efficient Semantic Segmentation

224 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Raviteja Vemulapalli

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiangyun Zhao - Raviteja Vemulapalli - Philip Mansfield

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases. This happens because deep CNNs trained with the de facto cross-entropy loss can easily overfit to small amounts of labeled data. To address this issue, we propose a simple and effective contrastive learning-based training strategy in which we first pretrain the network using a pixel-wise, label-based contrastive loss, and then fine-tune it using the cross-entropy loss. This approach increases intra-class compactness and inter-class separability, thereby resulting in a better pixel classifier. We demonstrate the effectiveness of the proposed training strategy using the Cityscapes and PASCAL VOC 2012 segmentation datasets. Our results show that pretraining with the proposed contrastive loss results in large performance gains (more than 20% absolute improvement in some settings) when the amount of labeled data is limited. In many settings, the proposed contrastive pretraining strategy, which does not use any additional data, is able to match or outperform the widely-used ImageNet pretraining strategy that uses more than a million additional labeled images.

قيم البحث

101 - Ho Hin Lee , Yucheng Tang , Qi Yang 2021

Contrastive learning has shown superior performance in embedding global and spatial invariant features in computer vision (e.g., image classification). However, its overall success of embedding local and spatial variant features is still limited, esp ecially for semantic segmentation. In a per-pixel prediction task, more than one label can exist in a single image for segmentation (e.g., an image contains both cat, dog, and grass), thereby it is difficult to define positive or negative pairs in a canonical contrastive learning setting. In this paper, we propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target. With our design, the same image can be embedded to different semantic clusters with semantic attention (i.e., coerce semantic masks) as an additional input channel. To achieve such attention, a novel two-stage training strategy is presented. We evaluate the proposed method on multi-organ medical image segmentation task, as our major task, with both in-house data and BTCV 2015 datasets. Comparing with the supervised and semi-supervised training state-of-the-art in the backbone of ResNet-50, our proposed pipeline yields substantial improvement of 5.53% and 6.09% in Dice score for both medical image segmentation cohorts respectively. The performance of the proposed method on natural images is assessed via PASCAL VOC 2012 dataset, and achieves 2.75% substantial improvement.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Semi-supervised Contrastive Learning for Label-efficient Medical Image Segmentation

191 - Xinrong Hu , Dewen Zeng , Xiaowei Xu 2021

The success of deep learning methods in medical image segmentation tasks heavily depends on a large amount of labeled data to supervise the training. On the other hand, the annotation of biomedical images requires domain knowledge and can be laboriou s. Recently, contrastive learning has demonstrated great potential in learning latent representation of images even without any label. Existing works have explored its application to biomedical image segmentation where only a small portion of data is labeled, through a pre-training phase based on self-supervised contrastive learning without using any labels followed by a supervised fine-tuning phase on the labeled portion of data only. In this paper, we establish that by including the limited label in formation in the pre-training phase, it is possible to boost the performance of contrastive learning. We propose a supervised local contrastive loss that leverages limited pixel-wise annotation to force pixels with the same label to gather around in the embedding space. Such loss needs pixel-wise computation which can be expensive for large images, and we further propose two strategies, downsampling and block division, to address the issue. We evaluate our methods on two public biomedical image datasets of different modalities. With different amounts of labeled data, our methods consistently outperform the state-of-the-art contrast-based methods and other semi-supervised learning techniques.

الرؤية الحاسوبية وتمييز الأنماط

Exploiting Negative Learning for Implicit Pseudo Label Rectification in Source-Free Domain Adaptive Semantic Segmentation

111 - Xin Luo , Wei Chen , Yusong Tan 2021

It is desirable to transfer the knowledge stored in a well-trained source model onto non-annotated target domain in the absence of source data. However, state-of-the-art methods for source free domain adaptation (SFDA) are subject to strict limits: 1 ) access to internal specifications of source models is a must; and 2) pseudo labels should be clean during self-training, making critical tasks relying on semantic segmentation unreliable. Aiming at these pitfalls, this study develops a domain adaptive solution to semantic segmentation with pseudo label rectification (namely textit{PR-SFDA}), which operates in two phases: 1) textit{Confidence-regularized unsupervised learning}: Maximum squares loss applies to regularize the target model to ensure the confidence in prediction; and 2) textit{Noise-aware pseudo label learning}: Negative learning enables tolerance to noisy pseudo labels in training, meanwhile positive learning achieves fast convergence. Extensive experiments have been performed on domain adaptive semantic segmentation benchmark, textit{GTA5 $to$ Cityscapes}. Overall, textit{PR-SFDA} achieves a performance of 49.0 mIoU, which is very close to that of the state-of-the-art counterparts. Note that the latter demand accesses to the source models internal specifications, whereas the textit{PR-SFDA} solution needs none as a sharp contrast.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Domain Adaptation for Semantic Segmentation via Patch-Wise Contrastive Learning

335 - Weizhe Liu , David Ferstl , Samuel Schulter 2021

We introduce a novel approach to unsupervised and semi-supervised domain adaptation for semantic segmentation. Unlike many earlier methods that rely on adversarial learning for feature alignment, we leverage contrastive learning to bridge the domain gap by aligning the features of structurally similar label patches across domains. As a result, the networks are easier to train and deliver better performance. Our approach consistently outperforms state-of-the-art unsupervised and semi-supervised methods on two challenging domain adaptive segmentation tasks, particularly with a small number of target domain annotations. It can also be naturally extended to weakly-supervised domain adaptation, where only a minor drop in accuracy can save up to 75% of annotation cost.

الرؤية الحاسوبية وتمييز الأنماط

Segmenter: Transformer for Semantic Segmentation

126 - Robin Strudel , Ricardo Garcia , Ivan Laptev 2021

Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to convoluti on-based methods, our approach allows to model global context already at the first layer and throughout the network. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. To do so, we rely on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder. We leverage models pre-trained for image classification and show that we can fine-tune them on moderate sized datasets available for semantic segmentation. The linear decoder allows to obtain excellent results already, but the performance can be further improved by a mask transformer generating class masks. We conduct an extensive ablation study to show the impact of the different parameters, in particular the performance is better for large models and small patch sizes. Segmenter attains excellent results for semantic segmentation. It outperforms the state of the art on both ADE20K and Pascal Context datasets and is competitive on Cityscapes.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي