Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

79 0 0.0 ( 0 )

Download Cite

Added by Enze Xie

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Enze Xie - Wenhai Wang - Zhiding Yu

Computer Vision and Pattern Recognition Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.

rate research

Efficient semantic image segmentation with superpixel pooling

268 - Mathijs Schuurmans , Maxim Berman , Matthew B. Blaschko 2018

In this work, we evaluate the use of superpixel pooling layers in deep network architectures for semantic segmentation. Superpixel pooling is a flexible and efficient replacement for other pooling strategies that incorporates spatial prior information. We propose a simple and efficient GPU-implementation of the layer and explore several designs for the integration of the layer into existing network architectures. We provide experimental results on the IBSR and Cityscapes dataset, demonstrating that superpixel pooling can be leveraged to consistently increase network accuracy with minimal computational overhead. Source code is available at https://github.com/bermanmaxim/superpixPool

Computer Vision and Pattern Recognition Machine Learning

Boosting Few-shot Semantic Segmentation with Transformers

145 - Guolei Sun , Yun Liu , Jingyun Liang 2021

Due to the fact that fully supervised semantic segmentation methods require sufficient fully-labeled data to work well and can not generalize to unseen classes, few-shot segmentation has attracted lots of research attention. Previous arts extract features from support and query images, which are processed jointly before making predictions on query images. The whole process is based on convolutional neural networks (CNN), leading to the problem that only local information is used. In this paper, we propose a TRansformer-based Few-shot Semantic segmentation method (TRFS). Specifically, our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM). GEM adopts transformer blocks to exploit global information, while LEM utilizes conventional convolutions to exploit local information, across query and support features. Both GEM and LEM are complementary, helping to learn better feature representations for segmenting query images. Extensive experiments on PASCAL-5i and COCO datasets show that our approach achieves new state-of-the-art performance, demonstrating its effectiveness.

Computer Vision and Pattern Recognition

Efficient Segmentation: Learning Downsampling Near Semantic Boundaries

400 - Dmitrii Marin , Zijian He , Peter Vajda 2019

Many automated processes such as auto-piloting rely on a good semantic segmentation as a critical component. To speed up performance, it is common to downsample the input frame. However, this comes at the cost of missed small objects and reduced accuracy at semantic boundaries. To address this problem, we propose a new content-adaptive downsampling technique that learns to favor sampling locations near semantic boundaries of target classes. Cost-performance analysis shows that our method consistently outperforms the uniform sampling improving balance between accuracy and computational efficiency. Our adaptive sampling gives segmentation with better quality of boundaries and more reliable support for smaller-size objects.

Computer Vision and Pattern Recognition Machine Learning

Design of Real-time Semantic Segmentation Decoder for Automated Driving

72 - Arindam Das , Saranya Kandan , Senthil Yogamani 2019

Semantic segmentation remains a computationally intensive algorithm for embedded deployment even with the rapid growth of computation power. Thus efficient network design is a critical aspect especially for applications like automated driving which requires real-time performance. Recently, there has been a lot of research on designing efficient encoders that are mostly task agnostic. Unlike image classification and bounding box object detection tasks, decoders are computationally expensive as well for semantic segmentation task. In this work, we focus on efficient design of the segmentation decoder and assume that an efficient encoder is already designed to provide shared features for a multi-task learning system. We design a novel efficient non-bottleneck layer and a family of decoders which fit into a small run-time budget using VGG10 as efficient encoder. We demonstrate in our dataset that experimentation with various design choices led to an improvement of 10% from a baseline performance.

Computer Vision and Pattern Recognition Machine Learning Machine Learning

Sample Efficient Semantic Segmentation using Rotation Equivariant Convolutional Networks

210 - Jasper Linmans , Jim Winkens , Bastiaan S. Veeling 2018

We propose a semantic segmentation model that exploits rotation and reflection symmetries. We demonstrate significant gains in sample efficiency due to increased weight sharing, as well as improvements in robustness to symmetry transformations. The group equivariant CNN framework is extended for segmentation by introducing a new equivariant (G->Z2)-convolution that transforms feature maps on a group to planar feature maps. Also, equivariant transposed convolution is formulated for up-sampling in an encoder-decoder network. To demonstrate improvements in sample efficiency we evaluate on multiple data regimes of a rotation-equivariant segmentation task: cancer metastases detection in histopathology images. We further show the effectiveness of exploiting more symmetries by varying the size of the group.

Computer Vision and Pattern Recognition Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions