Geometry-Aware Gradient Algorithms for Neural Architecture Search

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mikhail Khodak

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Liam Li - Mikhail Khodak - Maria-Florina Balcan

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent state-of-the-art methods for neural architecture search (NAS) exploit gradient-based optimization by relaxing the problem into continuous optimization over architectures and shared-weights, a noisy process that remains poorly understood. We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing, reducing the design of NAS methods to devising optimizers and regularizers that can quickly obtain high-quality solutions to this problem. Invoking the theory of mirror descent, we present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters, leading to simple yet novel algorithms that enjoy fast convergence guarantees and achieve state-of-the-art accuracy on the latest NAS benchmarks in computer vision. Notably, we exceed the best published results for both CIFAR and ImageNet on both the DARTS search space and NAS-Bench201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100. Together, our theory and experiments demonstrate a principled way to co-design optimizers and continuous relaxations of discrete NAS search spaces.

قيم البحث

105 - Xiaofang Wang , Shengcao Cao , Mengtian Li 2021

Existing neural architecture search (NAS) methods often return an architecture with good search performance but generalizes poorly to the test setting. To achieve better generalization, we propose a novel neighborhood-aware NAS formulation to identif y flat-minima architectures in the search space, with the assumption that flat minima generalize better than sharp minima. The phrase flat-minima architecture refers to architectures whose performance is stable under small perturbations in the architecture (e.g., replacing a convolution with a skip connection). Our formulation takes the flatness of an architecture into account by aggregating the performance over the neighborhood of this architecture. We demonstrate a principled way to apply our formulation to existing search algorithms, including sampling-based algorithms and gradient-based algorithms. To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures. Based on our formulation, we propose neighborhood-aware random search (NA-RS) and neighborhood-aware differentiable architecture search (NA-DARTS). Notably, by simply augmenting DARTS with our formulation, NA-DARTS finds architectures that perform better or on par with those found by state-of-the-art NAS methods on established benchmarks, including CIFAR-10, CIFAR-100 and ImageNet.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

Sub-Architecture Ensemble Pruning in Neural Architecture Search

321 - Yijun Bian , Qingquan Song , Mengnan Du 2019

Neural architecture search (NAS) is gaining more and more attention in recent years due to its flexibility and remarkable capability to reduce the burden of neural network design. To achieve better performance, however, the searching process usually costs massive computations that might not be affordable for researchers and practitioners. While recent attempts have employed ensemble learning methods to mitigate the enormous computational cost, however, they neglect a key property of ensemble methods, namely diversity, which leads to collecting more similar sub-architectures with potential redundancy in the final design. To tackle this problem, we propose a pruning method for NAS ensembles called Sub-Architecture Ensemble Pruning in Neural Architecture Search (SAEP). It targets to leverage diversity and to achieve sub-ensemble architectures at a smaller size with comparable performance to ensemble architectures that are not pruned. Three possible solutions are proposed to decide which sub-architectures to prune during the searching process. Experimental results exhibit the effectiveness of the proposed method by largely reducing the number of sub-architectures without degrading the performance.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Multinomial Distribution Learning for Effective Neural Architecture Search

85 - Xiawu Zheng , Rongrong Ji , Lang Tang 2019

Architectures obtained by Neural Architecture Search (NAS) have achieved highly competitive performance in various computer vision tasks. However, the prohibitive computation demand of forward-backward propagation in deep neural networks and searchin g algorithms makes it difficult to apply NAS in practice. In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS,which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution. Therefore, NAS can be transformed to a multinomial distribution learning problem, i.e., the distribution is optimized to have a high expectation of the performance. Besides, a hypothesis that the performance ranking is consistent in every training epoch is proposed and demonstrated to further accelerate the learning process. Experiments on CIFAR10 and ImageNet demonstrate the effectiveness of our method. On CIFAR-10, the structure searched by our method achieves 2.55% test error, while being 6.0x (only 4 GPU hours on GTX1080Ti) faster compared with state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2% top1 accuracy under MobileNet settings (MobileNet V1/V2), while being 1.2x faster with measured GPU latency. Test code with pre-trained models are available at https://github.com/tanglang96/MDENAS

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Accelerating Neural Architecture Search using Performance Prediction

99 - Bowen Baker , Otkrist Gupta , Ramesh Raskar 2017

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data. We empirically show that our performance prediction models are much more effective than prominent Bayesian counterparts, are simpler to implement, and are faster to train. Our models can predict final performance in both visual classification and language modeling domains, are effective for predicting performance of drastically varying model architectures, and can even generalize between model classes. Using these prediction models, we also propose an early stopping method for hyperparameter optimization and meta-modeling, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling. Finally, we empirically show that our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture selection algorithms and bandit based search methods. Through extensive experimentation, we empirically show our performance prediction models and early stopping algorithm are state-of-the-art in terms of prediction accuracy and speedup achieved while still identifying the optimal model configurations.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Neural Architecture Search of SPD Manifold Networks

377 - Rhea Sanjay Sukthanker , Zhiwu Huang , Suryansh Kumar 2020

In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks, aiming to automate the design of SPD neural architectures. To address this problem, we first introduce a geometrically ri ch and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem with a one-shot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the state-of-the-art SPD networks and traditional NAS algorithms. Empirical results show that our algorithm excels in discovering better performing SPD network design and provides models that are more than three times lighter than searched by the state-of-the-art NAS algorithms.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية