Concatenated Feature Pyramid Network for Instance Segmentation

283 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pranav Shenoy K P

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yongqing Sun - Pranav Shenoy K P - Jun Shimamura

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Low level features like edges and textures play an important role in accurately localizing instances in neural networks. In this paper, we propose an architecture which improves feature pyramid networks commonly used instance segmentation networks by incorporating low level features in all layers of the pyramid in an optimal and efficient way. Specifically, we introduce a new layer which learns new correlations from feature maps of multiple feature pyramid levels holistically and enhances the semantic information of the feature pyramid to improve accuracy. Our architecture is simple to implement in instance segmentation or object detection frameworks to boost accuracy. Using this method in Mask RCNN, our model achieves consistent improvement in precision on COCO Dataset with the computational overhead compared to the original feature pyramid network.

قيم البحث

152 - Yu-Huan Wu , Yun Liu , Le Zhang 2020

Much of the recent efforts on salient object detection (SOD) have been devoted to producing accurate saliency maps without being aware of their instance labels. To this end, we propose a new pipeline for end-to-end salient instance segmentation (SIS) that predicts a class-agnostic mask for each detected salient instance. To better use the rich feature hierarchies in deep networks and enhance the side predictions, we propose the regularized dense connections, which attentively promote informative features and suppress non-informative ones from all feature pyramids. A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions. Such strategies can be well-encapsulated into the Mask R-CNN pipeline. Extensive experiments on popular benchmarks demonstrate that our design significantly outperforms existing sArt competitors by 6.3% (58.6% vs. 52.3%) in terms of the AP metric.The code is available at https://github.com/yuhuan-wu/RDPNet.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

141 - Miao Hu , Yali Li , Lu Fang 2021

Learning pyramidal feature representations is crucial for recognizing object instances at different scales. Feature Pyramid Network (FPN) is the classic architecture to build a feature pyramid with high-level semantics throughout. However, intrinsic defects in feature extraction and fusion inhibit FPN from further aggregating more discriminative features. In this work, we propose Attention Aggregation based Feature Pyramid Network (A^2-FPN), to improve multi-scale feature learning through attention-guided feature aggregation. In feature extraction, it extracts discriminative features by collecting-distributing multi-level global context features, and mitigates the semantic information loss due to drastically reduced channels. In feature fusion, it aggregates complementary information from adjacent features to generate location-wise reassembly kernels for content-aware sampling, and employs channel-wise reweighting to enhance the semantic consistency before element-wise addition. A^2-FPN shows consistent gains on different instance segmentation frameworks. By replacing FPN with A^2-FPN in Mask R-CNN, our model boosts the performance by 2.1% and 1.6% mask AP when using ResNet-50 and ResNet-101 as backbone, respectively. Moreover, A^2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.

الرؤية الحاسوبية وتمييز الأنماط

Improving Video Instance Segmentation via Temporal Pyramid Routing

210 - Xiangtai Li , Hao He , Henghui Ding 2021

Video Instance Segmentation (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence. Existing approaches are mainly based on single-frame features or single-scale features of multipl e frames, where temporal information or multi-scale information is ignored. To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames. Specifically, TPR contains two novel components, including Dynamic Aligned Cell Routing (DACR) and Cross Pyramid Routing (CPR), where DACR is designed for aligning and gating pyramid features across temporal dimension, while CPR transfers temporally aggregated features across scale dimension. Moreover, our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods. Extensive experiments on YouTube-VIS dataset demonstrate the effectiveness and efficiency of the proposed approach on several state-of-the-art instance segmentation methods. Codes and trained models will be publicly available to facilitate future research.(url{https://github.com/lxtGH/TemporalPyramidRouting}).

الرؤية الحاسوبية وتمييز الأنماط

GraphFPN: Graph Feature Pyramid Network for Object Detection

187 - Gangming Zhao , Weifeng Ge , 2021

Feature pyramids have been proven powerful in image understanding tasks that require multi-scale features. State-of-the-art methods for multi-scale feature learning focus on performing feature interactions across space and scales using neural network s with a fixed topology. In this paper, we propose graph feature pyramid networks that are capable of adapting their topological structures to varying intrinsic image structures and supporting simultaneous feature interactions across all scales. We first define an image-specific superpixel hierarchy for each input image to represent its intrinsic image structures. The graph feature pyramid network inherits its structure from this superpixel hierarchy. Contextual and hierarchical layers are designed to achieve feature interactions within the same scale and across different scales. To make these layers more powerful, we introduce two types of local channel attention for graph neural networks by generalizing global channel attention for convolutional neural networks. The proposed graph feature pyramid network can enhance the multiscale features from a convolutional feature pyramid network. We evaluate our graph feature pyramid network in the object detection task by integrating it into the Faster R-CNN algorithm. The modified algorithm outperforms not only previous state-of-the-art feature pyramid-based methods with a clear margin but also other popular detection methods on both MS-COCO 2017 validation and test datasets.

الرؤية الحاسوبية وتمييز الأنماط

RDCNet: Instance segmentation with a minimalist recurrent residual network

88 - Raphael Ortiz , Gustavo de Medeiros , Antoine H.F.M. Peters 2020

Instance segmentation is a key step for quantitative microscopy. While several machine learning based methods have been proposed for this problem, most of them rely on computationally complex models that are trained on surrogate tasks. Building on re cent developments towards end-to-end trainable instance segmentation, we propose a minimalist recurrent network called recurrent dilated convolutional network (RDCNet), consisting of a shared stacked dilated convolution (sSDC) layer that iteratively refines its output and thereby generates interpretable intermediate predictions. It is light-weight and has few critical hyperparameters, which can be related to physical aspects such as object size or density.We perform a sensitivity analysis of its main parameters and we demonstrate its versatility on 3 tasks with different imaging modalities: nuclear segmentation of H&E slides, of 3D anisotropic stacks from light-sheet fluorescence microscopy and leaf segmentation of top-view images of plants. It achieves state-of-the-art on 2 of the 3 datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي