Using a thousand optimization tasks to learn hyperparameter search strategies

114 0 0.0 ( 0 )

Download Cite

Added by Luke Metz

Publication date 2020

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Luke Metz - Niru Maheswaranathan - Ruoxi Sun

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional neural networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets. As an example application of such a dataset we explore meta-learning an ordered list of hyperparameters to try sequentially. By learning this hyperparameter list from data generated using TaskSet we achieve large speedups in sample efficiency over random search. Next we use the diversity of the TaskSet and our method for learning hyperparameter lists to empirically explore the generalization of these lists to new optimization tasks in a variety of settings including ImageNet classification with Resnet50 and LM1B language modeling with transformers. As part of this work we have opensourced code for all tasks, as well as ~29 million training curves for these problems and the corresponding hyperparameters.

rate research

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

129 - Lisha Li , Kevin Jamieson , Giulia DeSalvo 2016

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems.

Machine Learning Machine Learning

YAHPO Gym -- Design Criteria and a new Multifidelity Benchmark for Hyperparameter Optimization

78 - Florian Pfisterer , Lennart Schneider , Julia Moosbauer 2021

When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites.

Machine Learning Machine Learning

Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction

88 - Hongyuan Zhan , Gabriel Gomes , Xiaoye S. Li 2018

Computational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically collected online by sensors, are serially correlated. Moreover, the data distribution may change gradually. A typical adaptation strategy is periodically re-tuning the model hyperparameters, at the cost of computational burden. In this work, we present an efficient and principled online hyperparameter optimization algorithm for Kernel Ridge regression applied to traffic prediction problems. In tests with real traffic measurement data, our approach requires as little as one-seventh of the computation time of other tuning methods, while achieving better or similar prediction accuracy.

Machine Learning Machine Learning

Using Under-trained Deep Ensembles to Learn Under Extreme Label Noise

103 - Konstantinos Nikolaidis , Thomas Plagemann , Stein Kristiansen 2020

Improper or erroneous labelling can pose a hindrance to reliable generalization for supervised learning. This can have negative consequences, especially for critical fields such as healthcare. We propose an effective new approach for learning under extreme label noise, based on under-trained deep ensembles. Each ensemble member is trained with a subset of the training data, to acquire a general overview of the decision boundary separation, without focusing on potentially erroneous details. The accumulated knowledge of the ensemble is combined to form new labels, that determine a better class separation than the original labels. A new model is trained with these labels to generalize reliably despite the label noise. We focus on a healthcare setting and extensively evaluate our approach on the task of sleep apnea detection. For comparison with related work, we additionally evaluate on the task of digit recognition. In our experiments, we observed performance improvement in accuracy from 6.7% up-to 49.3% for the task of digit classification and in kappa from 0.02 up-to 0.55 for the task of sleep apnea detection.

Machine Learning Machine Learning

Hyperparameter Optimization with Differentiable Metafeatures

131 - Hadi S. Jomaa , Lars Schmidt-Thieme , Josif Grabocka 2021

Metafeatures, or dataset characteristics, have been shown to improve the performance of hyperparameter optimization (HPO). Conventionally, metafeatures are precomputed and used to measure the similarity between datasets, leading to a better initialization of HPO models. In this paper, we propose a cross dataset surrogate model called Differentiable Metafeature-based Surrogate (DMFBS), that predicts the hyperparameter response, i.e. validation loss, of a model trained on the dataset at hand. In contrast to existing models, DMFBS i) integrates a differentiable metafeature extractor and ii) is optimized using a novel multi-task loss, linking manifold regularization with a dataset similarity measure learned via an auxiliary dataset identification meta-task, effectively enforcing the response approximation for similar datasets to be similar. We compare DMFBS against several recent models for HPO on three large meta-datasets and show that it consistently outperforms all of them with an average 10% improvement. Finally, we provide an extensive ablation study that examines the different components of our approach.

Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Using a thousand optimization tasks to learn hyperparameter search strategies

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions