No Arabic abstract
Neural Architecture Search (NAS) has shown great potentials in automatically designing scalable network architectures for dense image predictions. However, existing NAS algorithms usually compromise on restricted search space and search on proxy task to meet the achievable computational demands. To allow as wide as possible network architectures and avoid the gap between target and proxy dataset, we propose a Densely Connected NAS (DCNAS) framework, which directly searches the optimal network structures for the multi-scale representations of visual information, over a large-scale target dataset. Specifically, by connecting cells with each other using learnable weights, we introduce a densely connected search space to cover an abundance of mainstream network designs. Moreover, by combining both path-level and channel-level sampling strategies, we design a fusion module to reduce the memory consumption of ample search space. We demonstrate that the architecture obtained from our DCNAS algorithm achieves state-of-the-art performances on public semantic image segmentation benchmarks, including 84.3% on Cityscapes, and 86.9% on PASCAL VOC 2012. We also retain leading performances when evaluating the architecture on the more challenging ADE20K and Pascal Context dataset.
Much of the recent efforts on salient object detection (SOD) have been devoted to producing accurate saliency maps without being aware of their instance labels. To this end, we propose a new pipeline for end-to-end salient instance segmentation (SIS) that predicts a class-agnostic mask for each detected salient instance. To better use the rich feature hierarchies in deep networks and enhance the side predictions, we propose the regularized dense connections, which attentively promote informative features and suppress non-informative ones from all feature pyramids. A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions. Such strategies can be well-encapsulated into the Mask R-CNN pipeline. Extensive experiments on popular benchmarks demonstrate that our design significantly outperforms existing sArt competitors by 6.3% (58.6% vs. 52.3%) in terms of the AP metric.The code is available at https://github.com/yuhuan-wu/RDPNet.
Modern solutions to the single image super-resolution (SISR) problem using deep neural networks aim not only at better performance accuracy but also at a lighter and computationally efficient model. To that end, recently, neural architecture search (NAS) approaches have shown some tremendous potential. Following the same underlying, in this paper, we suggest a novel trilevel NAS method that provides a better balance between different efficiency metrics and performance to solve SISR. Unlike available NAS, our search is more complete, and therefore it leads to an efficient, optimized, and compressed architecture. We innovatively introduce a trilevel search space modeling, i.e., hierarchical modeling on network-, cell-, and kernel-level structures. To make the search on trilevel spaces differentiable and efficient, we exploit a new sparsestmax technique that is excellent at generating sparse distributions of individual neural architecture candidates so that they can be better disentangled for the final selection from the enlarged search space. We further introduce the sorting technique to the sparsestmax relaxation for better network-level compression. The proposed NAS optimization additionally facilitates simultaneous search and training in a single phase, reducing search time and train time. Comprehensive evaluations on the benchmark datasets show our methods clear superiority over the state-of-the-art NAS in terms of a good trade-off between model size, performance, and efficiency.
This paper studies the problem of learning semantic segmentation from image-level supervision only. Current popular solutions leverage object localization maps from classifiers as supervision signals, and struggle to make the localization maps capture more complete object content. Rather than previous efforts that primarily focus on intra-image information, we address the value of cross-image semantic relations for comprehensive object pattern mining. To achieve this, two neural co-attentions are incorporated into the classifier to complimentarily capture cross-image semantic similarities and differences. In particular, given a pair of training images, one co-attention enforces the classifier to recognize the common semantics from co-attentive objects, while the other one, called contrastive co-attention, drives the classifier to identify the unshared semantics from the rest, uncommon objects. This helps the classifier discover more object patterns and better ground semantics in image regions. In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference, hence eventually benefiting semantic segmentation learning. More essentially, our algorithm provides a unified framework that handles well different WSSS settings, i.e., learning WSSS with (1) precise image-level supervision only, (2) extra simple single-label data, and (3) extra noisy web data. It sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability. Moreover, our approach ranked 1st place in the Weakly-Supervised Semantic Segmentation Track of CVPR2020 Learning from Imperfect Data Challenge.
Neural architecture search (NAS) has witnessed prevailing success in image classification and (very recently) segmentation tasks. In this paper, we present the first preliminary study on introducing the NAS algorithm to generative adversarial networks (GANs), dubbed AutoGAN. The marriage of NAS and GANs faces its unique challenges. We define the search space for the generator architectural variations and use an RNN controller to guide the search, with parameter sharing and dynamic-resetting to accelerate the process. Inception score is adopted as the reward, and a multi-level search strategy is introduced to perform NAS in a progressive way. Experiments validate the effectiveness of AutoGAN on the task of unconditional image generation. Specifically, our discovered architectures achieve highly competitive performance compared to current state-of-the-art hand-crafted GANs, e.g., setting new state-of-the-art FID scores of 12.42 on CIFAR-10, and 31.01 on STL-10, respectively. We also conclude with a discussion of the current limitations and future potential of AutoGAN. The code is available at https://github.com/TAMU-VITA/AutoGAN
Panoptic segmentation is posed as a new popular test-bed for the state-of-the-art holistic scene understanding methods with the requirement of simultaneously segmenting both foreground things and background stuff. The state-of-the-art panoptic segmentation network exhibits high structural complexity in different network components, i.e. backbone, proposal-based foreground branch, segmentation-based background branch, and feature fusion module across branches, which heavily relies on expert knowledge and tedious trials. In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm. Notably, we extend the common single-task NAS into the multi-component scenario by taking the advantage of the newly proposed intra-modular search space and problem-oriented inter-modular search space, which helps us to obtain an optimal network architecture that not only performs well in both instance segmentation and semantic segmentation tasks but also be aware of the reciprocal relations between foreground things and background stuff classes. To relieve the vast computation burden incurred by applying NAS to complicated network architectures, we present a novel path-priority greedy search policy to find a robust, transferrable architecture with significantly reduced searching overhead. Our searched architecture, namely Auto-Panoptic, achieves the new state-of-the-art on the challenging COCO and ADE20K benchmarks. Moreover, extensive experiments are conducted to demonstrate the effectiveness of path-priority policy and transferability of Auto-Panoptic across different datasets. Codes and models are available at: https://github.com/Jacobew/AutoPanoptic.