ﻻ يوجد ملخص باللغة العربية
Introducing explicit constraints on the structural predictions has been an effective way to improve the performance of semantic segmentation models. Existing methods are mainly based on insufficient hand-crafted rules that only partially capture the image structure, and some methods can also suffer from the efficiency issue. As a result, most of the state-of-the-art fully convolutional networks did not adopt these techniques. In this work, we propose a simple, fast yet effective method that exploits structural information through direct supervision with minor additional expense. To be specific, our method explicitly requires the network to predict semantic segmentation as well as dilated affinity, which is a sparse version of pair-wise pixel affinity. The capability of telling the relationships between pixels are directly built into the model and enhance the quality of segmentation in two stages. 1) Joint training with dilated affinity can provide robust feature representations and thus lead to finer segmentation results. 2) The extra output of affinity information can be further utilized to refine the original segmentation with a fast propagation process. Consistent improvements are observed on various benchmark datasets when applying our framework to the existing state-of-the-art model. Codes will be released soon.
Most of the modern instance segmentation approaches fall into two categories: region-based approaches in which object bounding boxes are detected first and later used in cropping and segmenting instances; and keypoint-based approaches in which indivi
Semantic segmentation with dense pixel-wise annotation has achieved excellent performance thanks to deep learning. However, the generalization of semantic segmentation in the wild remains challenging. In this paper, we address the problem of unsuperv
Dilated convolutions are widely used in deep semantic segmentation models as they can enlarge the filters receptive field without adding additional weights nor sacrificing spatial resolution. However, as dilated convolutional filters do not possess p
We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient
Weakly supervised semantic segmentation is receiving great attention due to its low human annotation cost. In this paper, we aim to tackle bounding box supervised semantic segmentation, i.e., training accurate semantic segmentation models using bound