How Powerful are Performance Predictors in Neural Architecture Search?

130 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Colin White

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Colin White - Arber Zela - Binxin Ru

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Early methods in the rapidly developing field of neural architecture search (NAS) required fully training thousands of neural networks. To reduce this extreme computational cost, dozens of techniques have since been proposed to predict the final performance of neural architectures. Despite the success of such performance prediction methods, it is not well-understood how different families of techniques compare to one another, due to the lack of an agreed-upon evaluation metric and optimization for different constraints on the initialization time and query time. In this work, we give the first large-scale study of performance predictors by analyzing 31 techniques ranging from learning curve extrapolation, to weight-sharing, to supervised learning, to zero-cost proxies. We test a number of correlation- and rank-based performance measures in a variety of settings, as well as the ability of each technique to speed up predictor-based NAS frameworks. Our results act as recommendations for the best predictors to use in different settings, and we show that certain families of predictors can be combined to achieve even better predictive power, opening up promising research directions. Our code, featuring a library of 31 performance predictors, is available at https://github.com/automl/naslib.

قيم البحث

186 - Miao Zhang , Huiqi Li , Shirui Pan 2019

One-Shot Neural architecture search (NAS) attracts broad attention recently due to its capacity to reduce the computational hours through weight sharing. However, extensive experiments on several recent works show that there is no positive correlatio n between the validation accuracy with inherited weights from the supernet and the test accuracy after re-training for One-Shot NAS. Different from devising a controller to find the best performing architecture with inherited weights, this paper focuses on how to sample architectures to train the supernet to make it more predictive. A single-path supernet is adopted, where only a small part of weights are optimized in each step, to reduce the memory demand greatly. Furthermore, we abandon devising complicated reward based architecture sampling controller, and sample architectures to train supernet based on novelty search. An efficient novelty search method for NAS is devised in this paper, and extensive experiments demonstrate the effectiveness and efficiency of our novelty search based architecture sampling method. The best architecture obtained by our algorithm with the same search space achieves the state-of-the-art test error rate of 2.51% on CIFAR-10 with only 7.5 hours search time in a single GPU, and a validation perplexity of 60.02 and a test perplexity of 57.36 on PTB. We also transfer these search cell structures to larger datasets ImageNet and WikiText-2, respectively.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Effective, Efficient and Robust Neural Architecture Search

190 - Zhixiong Yue , Baijiong Lin , Xiaonan Huang 2020

Recent advances in adversarial attacks show the vulnerability of deep neural networks searched by Neural Architecture Search (NAS). Although NAS methods can find network architectures with the state-of-the-art performance, the adversarial robustness and resource constraint are often ignored in NAS. To solve this problem, we propose an Effective, Efficient, and Robust Neural Architecture Search (E2RNAS) method to search a neural network architecture by taking the performance, robustness, and resource constraint into consideration. The objective function of the proposed E2RNAS method is formulated as a bi-level multi-objective optimization problem with the upper-level problem as a multi-objective optimization problem, which is different from existing NAS methods. To solve the proposed objective function, we integrate the multiple-gradient descent algorithm, a widely studied gradient-based multi-objective optimization algorithm, with the bi-level optimization. Experiments on benchmark datasets show that the proposed E2RNAS method can find adversarially robust architectures with optimized model size and comparable classification accuracy.

التعلم الآلي الحوسبة العصبية والتطورية

Neural Architecture Generator Optimization

131 - Binxin Ru , Pedro Esperanca , Fabio Carlucci 2020

Neural Architecture Search (NAS) was first proposed to achieve state-of-the-art performance through the discovery of new architecture patterns, without human intervention. An over-reliance on expert knowledge in the search space design has however le d to increased performance (local optima) without significant architectural breakthroughs, thus preventing truly novel solutions from being reached. In this work we 1) are the first to investigate casting NAS as a problem of finding the optimal network generator and 2) we propose a new, hierarchical and graph-based search space capable of representing an extremely large variety of network types, yet only requiring few continuous hyper-parameters. This greatly reduces the dimensionality of the problem, enabling the effective use of Bayesian Optimisation as a search strategy. At the same time, we expand the range of valid architectures, motivating a multi-objective learning approach. We demonstrate the effectiveness of this strategy on six benchmark datasets and show that our search space generates extremely lightweight yet highly competitive models.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

How Does Supernet Help in Neural Architecture Search?

223 - Yuge Zhang , Quanlu Zhang , Yaming Yang 2020

Weight sharing, as an approach to speed up architecture performance estimation has received wide attention. Instead of training each architecture separately, weight sharing builds a supernet that assembles all the architectures as its submodels. Howe ver, there has been debate over whether the NAS process actually benefits from weight sharing, due to the gap between supernet optimization and the objective of NAS. To further understand the effect of weight sharing on NAS, we conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS. We find that weight sharing works well on some search spaces but fails on others. Taking a step forward, we further identified biases accounting for such phenomenon and the capacity of weight sharing. Our work is expected to inspire future NAS researchers to better leverage the power of weight sharing.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Accelerating Neural Architecture Search using Performance Prediction

99 - Bowen Baker , Otkrist Gupta , Ramesh Raskar 2017

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data. We empirically show that our performance prediction models are much more effective than prominent Bayesian counterparts, are simpler to implement, and are faster to train. Our models can predict final performance in both visual classification and language modeling domains, are effective for predicting performance of drastically varying model architectures, and can even generalize between model classes. Using these prediction models, we also propose an early stopping method for hyperparameter optimization and meta-modeling, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling. Finally, we empirically show that our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture selection algorithms and bandit based search methods. Through extensive experimentation, we empirically show our performance prediction models and early stopping algorithm are state-of-the-art in terms of prediction accuracy and speedup achieved while still identifying the optimal model configurations.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية