ﻻ يوجد ملخص باللغة العربية
Tuning machine learning models with Bayesian optimization (BO) is a successful strategy to find good hyperparameters. BO defines an iterative procedure where a cross-validated metric is evaluated on promising hyperparameters. In practice, however, an improvement of the validation metric may not translate in better predictive performance on a test set, especially when tuning models trained on small datasets. In other words, unlike conventional wisdom dictates, BO can overfit. In this paper, we carry out the first systematic investigation of overfitting in BO and demonstrate that this issue is serious, yet often overlooked in practice. We propose a novel criterion to early stop BO, which aims to maintain the solution quality while saving the unnecessary iterations that can lead to overfitting. Experiments on real-world hyperparameter optimization problems show that our approach effectively meets these goals and is more adaptive comparing to baselines.
We examine two key questions in GAN training, namely overfitting and mode drop, from an empirical perspective. We show that when stochasticity is removed from the training procedure, GANs can overfit and exhibit almost no mode drop. Our results shed
The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the fol
Many proposed methods for explaining machine learning predictions are in fact challenging to understand for nontechnical consumers. This paper builds upon an alternative consumer-driven approach called TED that asks for explanations to be provided in
Bayesian Optimization is a sample-efficient black-box optimization procedure that is typically applied to problems with a small number of independent objectives. However, in practice we often wish to optimize objectives defined over many correlated o
We derive an optimal policy for adaptively restarting a randomized algorithm, based on observed features of the run-so-far, so as to minimize the expected time required for the algorithm to successfully terminate. Given a suitable Bayesian prior, thi