Three Birds One Stone: A General Architecture for Salient Object Segmentation, Edge Detection and Skeleton Extraction

123 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Qibin Hou

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Qibin Hou - Jiang-Jiang Liu - Ming-Ming Cheng

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we aim at solving pixel-wise binary problems, including salient object segmentation, skeleton extraction, and edge detection, by introducing a unified architecture. Previous works have proposed tailored methods for solving each of the three tasks independently. Here, we show that these tasks share some similarities that can be exploited for developing a unified framework. In particular, we introduce a horizontal cascade, each component of which is densely connected to the outputs of previous component. Stringing these components together allows us to effectively exploit features across different levels hierarchically to effectively address the multiple pixel-wise binary regression tasks. To assess the performance of our proposed network on these tasks, we carry out exhaustive evaluations on multiple representative datasets. Although these tasks are inherently very different, we show that our unified approach performs very well on all of them and works far better than current single-purpose state-of-the-art methods. All the code in this paper will be publicly available.

قيم البحث

87 - Zhengzheng Tu , Yan Ma , Chenglong Li 2019

Fully Convolutional Neural Network (FCN) has been widely applied to salient object detection recently by virtue of high-level semantic feature extraction, but existing FCN based methods still suffer from continuous striding and pooling operations lea ding to loss of spatial structure and blurred edges. To maintain the clear edge structure of salient objects, we propose a novel Edge-guided Non-local FCN (ENFNet) to perform edge guided feature learning for accurate salient object detection. In a specific, we extract hierarchical global and local information in FCN to incorporate non-local features for effective feature representations. To preserve good boundaries of salient objects, we propose a guidance block to embed edge prior knowledge into hierarchical feature maps. The guidance block not only performs feature-wise manipulation but also spatial-wise transformation for effective edge embeddings. Our model is trained on the MSRA-B dataset and tested on five popular benchmark datasets. Comparing with the state-of-the-art methods, the proposed method achieves the best performance on all datasets.

الرؤية الحاسوبية وتمييز الأنماط

BiconNet: An Edge-preserved Connectivity-based Approach for Salient Object Detection

152 - Ziyun Yang , Somayyeh Soltanian-Zadeh , Sina Farsiu 2021

Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. A limitation of current SOD models is insufficient utilization of inter-pixel information, which usually results in imperfect segmentation near edge regions and low spatial coherence. As we demonstrate, using a saliency mask as the only label is suboptimal. To address this limitation, we propose a connectivity-based approach called bilateral connectivity network (BiconNet), which uses connectivity masks together with saliency masks as labels for effective modeling of inter-pixel relationships and object saliency. Moreover, we propose a bilateral voting module to enhance the output connectivity map, and a novel edge feature enhancement method that efficiently utilizes edge-specific features. Through comprehensive experiments on five benchmark datasets, we demonstrate that our proposed method can be plugged into any existing state-of-the-art saliency-based SOD framework to improve its performance with negligible parameter increase.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي معالجة الصور والفيديو

Multi-scale Edge-based U-shape Network for Salient Object Detection

145 - Han Sun , Yetong Bian , Ningzhong Liu 2021

Deep-learning based salient object detection methods achieve great improvements. However, there are still problems existing in the predictions, such as blurry boundary and inaccurate location, which is mainly caused by inadequate feature extraction a nd integration. In this paper, we propose a Multi-scale Edge-based U-shape Network (MEUN) to integrate various features at different scales to achieve better performance. To extract more useful information for boundary prediction, U-shape Edge Network modules are embedded in each decoder units. Besides, the additional down-sampling module alleviates the location inaccuracy. Experimental results on four benchmark datasets demonstrate the validity and reliability of the proposed method. Multi-scale Edge based U-shape Network also shows its superiority when compared with 15 state-of-the-art salient object detection methods.

الرؤية الحاسوبية وتمييز الأنماط

Transformer Transforms Salient Object Detection and Camouflaged Object Detection

186 - Yuxin Mao , Jing Zhang , Zhexiong Wan 2021

The transformer networks are particularly good at modeling long-range dependencies within a long sequence. In this paper, we conduct research on applying the transformer networks for salient object detection (SOD). We adopt the dense transformer back bone for fully supervised RGB image based SOD, RGB-D image pair based SOD, and weakly supervised SOD within a unified framework based on the observation that the transformer backbone can provide accurate structure modeling, which makes it powerful in learning from weak labels with less structure information. Further, we find that the vision transformer architectures do not offer direct spatial supervision, instead encoding position as a feature. Therefore, we investigate the contributions of two strategies to provide stronger spatial supervision through the transformer layers within our unified framework, namely deep supervision and difficulty-aware learning. We find that deep supervision can get gradients back into the higher level features, thus leads to uniform activation within the same semantic object. Difficulty-aware learning on the other hand is capable of identifying the hard pixels for effective hard negative mining. We also visualize features of conventional backbone and transformer backbone before and after fine-tuning them for SOD, and find that transformer backbone encodes more accurate object structure information and more distinct semantic information within the lower and higher level features respectively. We also apply our model to camouflaged object detection (COD) and achieve similar observations as the above three SOD tasks. Extensive experimental results on various SOD and COD tasks illustrate that transformer networks can transform SOD and COD, leading to new benchmarks for each related task. The source code and experimental results are available via our project page: https://github.com/fupiao1998/TrasformerSOD.

الرؤية الحاسوبية وتمييز الأنماط

Reverse Attention for Salient Object Detection

125 - Shuhan Chen , Xiuli Tan , Ben Wang 2018

Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low reso lution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الملك عبد العزيز

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Three Birds One Stone: A General Architecture for Salient Object Segmentation, Edge Detection and Skeleton Extraction

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً