SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Enze Xie

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Enze Xie - Wenhai Wang - Zhiding Yu

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.

قيم البحث

268 - Mathijs Schuurmans , Maxim Berman , Matthew B. Blaschko 2018

In this work, we evaluate the use of superpixel pooling layers in deep network architectures for semantic segmentation. Superpixel pooling is a flexible and efficient replacement for other pooling strategies that incorporates spatial prior informatio n. We propose a simple and efficient GPU-implementation of the layer and explore several designs for the integration of the layer into existing network architectures. We provide experimental results on the IBSR and Cityscapes dataset, demonstrating that superpixel pooling can be leveraged to consistently increase network accuracy with minimal computational overhead. Source code is available at https://github.com/bermanmaxim/superpixPool

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Boosting Few-shot Semantic Segmentation with Transformers

145 - Guolei Sun , Yun Liu , Jingyun Liang 2021

Due to the fact that fully supervised semantic segmentation methods require sufficient fully-labeled data to work well and can not generalize to unseen classes, few-shot segmentation has attracted lots of research attention. Previous arts extract fea tures from support and query images, which are processed jointly before making predictions on query images. The whole process is based on convolutional neural networks (CNN), leading to the problem that only local information is used. In this paper, we propose a TRansformer-based Few-shot Semantic segmentation method (TRFS). Specifically, our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM). GEM adopts transformer blocks to exploit global information, while LEM utilizes conventional convolutions to exploit local information, across query and support features. Both GEM and LEM are complementary, helping to learn better feature representations for segmenting query images. Extensive experiments on PASCAL-5i and COCO datasets show that our approach achieves new state-of-the-art performance, demonstrating its effectiveness.

الرؤية الحاسوبية وتمييز الأنماط

Efficient Segmentation: Learning Downsampling Near Semantic Boundaries

400 - Dmitrii Marin , Zijian He , Peter Vajda 2019

Many automated processes such as auto-piloting rely on a good semantic segmentation as a critical component. To speed up performance, it is common to downsample the input frame. However, this comes at the cost of missed small objects and reduced accu racy at semantic boundaries. To address this problem, we propose a new content-adaptive downsampling technique that learns to favor sampling locations near semantic boundaries of target classes. Cost-performance analysis shows that our method consistently outperforms the uniform sampling improving balance between accuracy and computational efficiency. Our adaptive sampling gives segmentation with better quality of boundaries and more reliable support for smaller-size objects.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Design of Real-time Semantic Segmentation Decoder for Automated Driving

72 - Arindam Das , Saranya Kandan , Senthil Yogamani 2019

Semantic segmentation remains a computationally intensive algorithm for embedded deployment even with the rapid growth of computation power. Thus efficient network design is a critical aspect especially for applications like automated driving which r equires real-time performance. Recently, there has been a lot of research on designing efficient encoders that are mostly task agnostic. Unlike image classification and bounding box object detection tasks, decoders are computationally expensive as well for semantic segmentation task. In this work, we focus on efficient design of the segmentation decoder and assume that an efficient encoder is already designed to provide shared features for a multi-task learning system. We design a novel efficient non-bottleneck layer and a family of decoders which fit into a small run-time budget using VGG10 as efficient encoder. We demonstrate in our dataset that experimentation with various design choices led to an improvement of 10% from a baseline performance.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Sample Efficient Semantic Segmentation using Rotation Equivariant Convolutional Networks

210 - Jasper Linmans , Jim Winkens , Bastiaan S. Veeling 2018

We propose a semantic segmentation model that exploits rotation and reflection symmetries. We demonstrate significant gains in sample efficiency due to increased weight sharing, as well as improvements in robustness to symmetry transformations. The g roup equivariant CNN framework is extended for segmentation by introducing a new equivariant (G->Z2)-convolution that transforms feature maps on a group to planar feature maps. Also, equivariant transposed convolution is formulated for up-sampling in an encoder-decoder network. To demonstrate improvements in sample efficiency we evaluate on multiple data regimes of a rotation-equivariant segmentation task: cancer metastases detection in histopathology images. We further show the effectiveness of exploiting more symmetries by varying the size of the group.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي