ﻻ يوجد ملخص باللغة العربية
Semantic image segmentation, which becomes one of the key applications in image processing and computer vision domain, has been used in multiple domains such as medical area and intelligent transportation. Lots of benchmark datasets are released for researchers to verify their algorithms. Semantic segmentation has been studied for many years. Since the emergence of Deep Neural Network (DNN), segmentation has made a tremendous progress. In this paper, we divide semantic image segmentation methods into two categories: traditional and recent DNN method. Firstly, we briefly summarize the traditional method as well as datasets released for segmentation, then we comprehensively investigate recent methods based on DNN which are described in the eight aspects: fully convolutional network, upsample ways, FCN joint with CRF methods, dilated convolution approaches, progresses in backbone network, pyramid methods, Multi-level feature and multi-stage method, supervised, weakly-supervised and unsupervised methods. Finally, a conclusion in this area is drawn.
Semantic segmentation with deep learning has achieved great progress in classifying the pixels in the image. However, the local location information is usually ignored in the high-level feature extraction by the deep learning, which is important for
In this work, we evaluate the use of superpixel pooling layers in deep network architectures for semantic segmentation. Superpixel pooling is a flexible and efficient replacement for other pooling strategies that incorporates spatial prior informatio
The balance between high accuracy and high speed has always been a challenging task in semantic image segmentation. Compact segmentation networks are more widely used in the case of limited resources, while their performances are constrained. In this
Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated to combine such transformers with CNN-based s
Co-occurrent visual pattern makes aggregating contextual information a common paradigm to enhance the pixel representation for semantic image segmentation. The existing approaches focus on modeling the context from the perspective of the whole image,