ﻻ يوجد ملخص باللغة العربية
Hyperparameter optimization aims to find the optimal hyperparameter configuration of a machine learning model, which provides the best performance on a validation dataset. Manual search usually leads to get stuck in a local hyperparameter configuration, and heavily depends on human intuition and experience. A simple alternative of manual search is random/grid search on a space of hyperparameters, which still undergoes extensive evaluations of validation errors in order to find its best configuration. Bayesian optimization that is a global optimization method for black-box functions is now popular for hyperparameter optimization, since it greatly reduces the number of validation error evaluations required, compared to random/grid search. Bayesian optimization generally finds the best hyperparameter configuration from random initialization without any prior knowledge. This motivates us to let Bayesian optimization start from the configurations that were successful on similar datasets, which are able to remarkably minimize the number of evaluations. In this paper, we propose deep metric learning to learn meta-features over datasets such that the similarity over them is effectively measured by Euclidean distance between their associated meta-features. To this end, we introduce a Siamese network composed of deep feature and meta-feature extractors, where deep feature extractor provides a semantic representation of each instance in a dataset and meta-feature extractor aggregates a set of deep features to encode a single representation over a dataset. Then, our learned meta-features are used to select a few datasets similar to the new dataset, so that hyperparameters in similar datasets are adopted as initializations to warm-start Bayesian hyperparameter optimization.
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire train
We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization
Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-perfor
Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to m
Bayesian optimization is a popular method for solving the problem of global optimization of an expensive-to-evaluate black-box function. It relies on a probabilistic surrogate model of the objective function, upon which an acquisition function is bui