Non-smooth Bayesian Optimization in Tuning Problems

139 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hengrui Luo

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Hengrui Luo - James W. Demmel - Younghyun Cho

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Building surrogate models is one common approach when we attempt to learn unknown black-box functions. Bayesian optimization provides a framework which allows us to build surrogate models based on sequential samples drawn from the function and find the optimum. Tuning algorithmic parameters to optimize the performance of large, complicated black-box application codes is a specific important application, which aims at finding the optima of black-box functions. Within the Bayesian optimization framework, the Gaussian process model produces smooth or continuous sample paths. However, the black-box function in the tuning problem is often non-smooth. This difficult tuning problem is worsened by the fact that we usually have limited sequential samples from the black-box function. Motivated by these issues encountered in tuning, we propose a novel additive Gaussian process model called clustered Gaussian process (cGP), where the additive components are induced by clustering. In the examples we studied, the performance can be improved by as much as 90% among repetitive experiments. By using this surrogate model, we want to capture the non-smoothness of the black-box function. In addition to an algorithm for constructing this model, we also apply the model to several artificial and real applications to evaluate it.

قيم البحث

اقرأ أيضاً

On Batch Bayesian Optimization

134 - Sayak Ray Chowdhury , Aditya Gopalan 2019

We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.

التعلم الآلي التعلم الالي

Bayesian Optimization under Heavy-tailed Payoffs

135 - Sayak Ray Chowdhury , Aditya Gopalan 2019

We consider black box optimization of an unknown function in the nonparametric Gaussian process setting when the noise in the observed function values can be heavy tailed. This is in contrast to existing literature that typically assumes sub-Gaussian noise distributions for queries. Under the assumption that the unknown function belongs to the Reproducing Kernel Hilbert Space (RKHS) induced by a kernel, we first show that an adaptation of the well-known GP-UCB algorithm with reward truncation enjoys sublinear $tilde{O}(T^{frac{2 + alpha}{2(1+alpha)}})$ regret even with only the $(1+alpha)$-th moments, $alpha in (0,1]$, of the reward distribution being bounded ($tilde{O}$ hides logarithmic factors). However, for the common squared exponential (SE) and Mat{e}rn kernels, this is seen to be significantly larger than a fundamental $Omega(T^{frac{1}{1+alpha}})$ lower bound on regret. We resolve this gap by developing novel Bayesian optimization algorithms, based on kernel approximation techniques, with regret bounds matching the lower bound in order for the SE kernel. We numerically benchmark the algorithms on environments based on both synthetic models and real-world data sets.

التعلم الآلي التعلم الالي

Multi-objective Bayesian Optimization using Pareto-frontier Entropy

104 - Shinya Suzuki , Shion Takeno , Tomoyuki Tamura 2019

This paper studies an entropy-based multi-objective Bayesian optimization (MBO). The entropy search is successful approach to Bayesian optimization. However, for MBO, existing entropy-based methods ignore trade-off among objectives or introduce unrel iable approximations. We propose a novel entropy-based MBO called Pareto-frontier entropy search (PFES) by considering the entropy of Pareto-frontier, which is an essential notion of the optimality of the multi-objective problem. Our entropy can incorporate the trade-off relation of the optimal values, and further, we derive an analytical formula without introducing additional approximations or simplifications to the standard entropy search setting. We also show that our entropy computation is practically feasible by using a recursive decomposition technique which has been known in studies of the Pareto hyper-volume computation. Besides the usual MBO setting, in which all the objectives are simultaneously observed, we also consider the decoupled setting, in which the objective functions can be observed separately. PFES can easily adapt to the decoupled setting by considering the entropy of the marginal density for each output dimension. This approach incorporates dependency among objectives conditioned on Pareto-frontier, which is ignored by the existing method. Our numerical experiments show effectiveness of PFES through several benchmark datasets.

التعلم الآلي التعلم الالي

Bayesian Optimization with a Prior for the Optimum

124 - Artur Souza , Luigi Nardi , Leonardo B. Oliveira 2020

While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts. This causes BO to waste function evaluations on bad design choices (e.g., machine learning hyperparameters) that the expert already knows to work poorly. To address this issue, we introduce Bayesian Optimization with a Prior for the Optimum (BOPrO). BOPrO allows users to inject their knowledge into the optimization process in the form of priors about which parts of the input space will yield the best performance, rather than BOs standard priors over functions, which are much less intuitive for users. BOPrO then combines these priors with BOs standard probabilistic model to form a pseudo-posterior used to select which points to evaluate next. We show that BOPrO is around 6.67x faster than state-of-the-art methods on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application. We also show that BOPrO converges faster even if the priors for the optimum are not entirely accurate and that it robustly recovers from misleading priors.

التعلم الآلي التعلم الالي

No-regret Algorithms for Multi-task Bayesian Optimization

140 - Sayak Ray Chowdhury , Aditya Gopalan 2020

We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives. Most existing BO algorithms do not model the fact that the multiple objectives, or equivalently, tasks can share similarities, and even the few that do lack rigorous, finite-time regret guarantees that capture explicitly inter-task structure. In this work, we address this problem by modelling inter-task dependencies using a multi-task kernel and develop two novel BO algorithms based on random scalarizations of the objectives. Our algorithms employ vector-valued kernel regression as a stepping stone and belong to the upper confidence bound class of algorithms. Under a smoothness assumption that the unknown vector-valued function is an element of the reproducing kernel Hilbert space associated with the multi-task kernel, we derive worst-case regret bounds for our algorithms that explicitly capture the similarities between tasks. We numerically benchmark our algorithms on both synthetic and real-life MOO problems, and show the advantages offered by learning with multi-task kernels.

التعلم الآلي التعلم الالي