SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

109 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xianzhi Du

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xianzhi Du - Tsung-Yi Lin - Pengchong Jin

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. Using similar building blocks, SpineNet models outperform ResNet-FPN models by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, SpineNet-190 achieves 52.5% AP with a MaskR-CNN detector and achieves 52.1% AP with a RetinaNet detector on COCO for a single model without test-time augmentation, significantly outperforms prior art of detectors. SpineNet can transfer to classification tasks, achieving 5% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset. Code is at: https://github.com/tensorflow/tpu/tree/master/models/official/detection.

قيم البحث

119 - Xianzhi Du , Tsung-Yi Lin , Pengchong Jin 2020

Recently, SpineNet has demonstrated promising results on object detection and image classification over ResNet model. However, it is unclear if the improvement adds up when combining scale-permuted backbone with advanced efficient operations and comp ound scaling. Furthermore, SpineNet is built with a uniform resource distribution over operations. While this strategy seems to be prevalent for scale-decreased models, it may not be an optimal design for scale-permuted models. In this work, we propose a simple technique to combine efficient operations and compound scaling with a previously learned scale-permuted architecture. We demonstrate the efficiency of scale-permuted model can be further improved by learning a resource distribution over the entire network. The resulting efficient scale-permuted models outperform state-of-the-art EfficientNet-based models on object detection and achieve competitive performance on image classification and semantic segmentation. Code and models will be open-sourced soon.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Deep weakly-supervised learning methods for classification and localization in histology images: a survey

255 - Jer^ome Rony , Soufiane Belharbi , Jose Dolz 2019

Using state-of-the-art deep learning models for cancer diagnosis presents several challenges related to the nature and availability of labeled histology images. In particular, cancer grading and localization in these images normally relies on both im age- and pixel-level labels, the latter requiring a costly annotation process. In this survey, deep weakly-supervised learning (WSL) models are investigated to identify and locate diseases in histology images, without the need for pixel-level annotations. Given training data with global image-level labels, these models allow to simultaneously classify histology images and yield pixel-wise localization scores, thereby identifying the corresponding regions of interest (ROI). Since relevant WSL models have mainly been investigated within the computer vision community, and validated on natural scene images, we assess the extent to which they apply to histology images which have challenging properties, e.g. very large size, similarity between foreground/background, highly unstructured regions, stain heterogeneity, and noisy/ambiguous labels. The most relevant models for deep WSL are compared experimentally in terms of accuracy (classification and pixel-wise localization) on several public benchmark histology datasets for breast and colon cancer -- BACH ICIAR 2018, BreaKHis, CAMELYON16, and GlaS. Furthermore, for large-scale evaluation of WSL models on histology images, we propose a protocol to construct WSL datasets from Whole Slide Imaging. Results indicate that several deep learning models can provide a high level of classification accuracy, although accurate pixel-wise localization of cancer regions remains an issue for such images. Code is publicly available.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Memory-efficient Learning for Large-scale Computational Imaging

173 - Michael Kellman , Kevin Zhang , Jon Tamir 2020

Critical aspects of computational imaging systems, such as experimental design and image priors, can be optimized through deep networks formed by the unrolled iterations of classical model-based reconstructions (termed physics-based networks). Howeve r, for real-world large-scale inverse problems, computing gradients via backpropagation is infeasible due to memory limitations of graphics processing units. In this work, we propose a memory-efficient learning procedure that exploits the reversibility of the networks layers to enable data-driven design for large-scale computational imaging systems. We demonstrate our method on a small-scale compressed sensing example, as well as two large-scale real-world systems: multi-channel magnetic resonance imaging and super-resolution optical microscopy.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Learning the Redundancy-free Features for Generalized Zero-Shot Object Recognition

110 - Zongyan Han , Zhenyong Fu , Jian Yang 2020

Zero-shot object recognition or zero-shot learning aims to transfer the object recognition ability among the semantically related categories, such as fine-grained animal or bird species. However, the images of different fine-grained objects tend to m erely exhibit subtle differences in appearance, which will severely deteriorate zero-shot object recognition. To reduce the superfluous information in the fine-grained objects, in this paper, we propose to learn the redundancy-free features for generalized zero-shot learning. We achieve our motivation by projecting the original visual features into a new (redundancy-free) feature space and then restricting the statistical dependence between these two feature spaces. Furthermore, we require the projected features to keep and even strengthen the category relationship in the redundancy-free feature space. In this way, we can remove the redundant information from the visual features without losing the discriminative information. We extensively evaluate the performance on four benchmark datasets. The results show that our redundancy-free feature based generalized zero-shot learning (RFF-GZSL) approach can achieve competitive results compared with the state-of-the-arts.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

AutoSTR: Efficient Backbone Search for Scene Text Recognition

76 - Hui Zhang , Quanming Yao , Mingkun Yang 2020

Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like rectificatio n and deblurring, or the sequence translator. However, another critical module, i.e., the feature sequence extractor, has not been extensively explored. In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance. First, we design a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path. Then, we propose a two-step search algorithm, which decouples operations and downsampling path, for an efficient search in the given space. Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.

الرؤية الحاسوبية وتمييز الأنماط