No Arabic abstract
In this paper, the problem of safe global maximization (it should not be confused with robust optimization) of expensive noisy black-box functions satisfying the Lipschitz condition is considered. The notion safe means that the objective function $f(x)$ during optimization should not violate a safety threshold, for instance, a certain a priori given value $h$ in a maximization problem. Thus, any new function evaluation (possibly corrupted by noise) must be performed at safe points only, namely, at points $y$ for which it is known that the objective function $f(y) > h$. The main difficulty here consists in the fact that the used optimization algorithm should ensure that the safety constraint will be satisfied at a point $y$ before evaluation of $f(y)$ will be executed. Thus, it is required both to determine the safe region $Omega$ within the search domain~$D$ and to find the global maximum within $Omega$. An additional difficulty consists in the fact that these problems should be solved in the presence of the noise. This paper starts with a theoretical study of the problem and it is shown that even though the objective function $f(x)$ satisfies the Lipschitz condition, traditional Lipschitz minorants and majorants cannot be used due to the presence of the noise. Then, a $delta$-Lipschitz framework and two algorithms using it are proposed to solve the safe global maximization problem. The first method determines the safe area within the search domain and the second one executes the global maximization over the found safe region. For both methods a number of theoretical results related to their functioning and convergence is established. Finally, numerical experiments confirming the reliability of the proposed procedures are performed.
We present mlrMBO, a flexible and comprehensive R toolbox for model-based optimization (MBO), also known as Bayesian optimization, which addresses the problem of expensive black-box optimization by approximating the given objective function through a surrogate regression model. It is designed for both single- and multi-objective optimization with mixed continuous, categorical and conditional parameters. Additional features include multi-point batch proposal, parallelization, visualization, logging and error-handling. mlrMBO is implemented in a modular fashion, such that single components can be easily replaced or adapted by the user for specific use cases, e.g., any regression learner from the mlr toolbox for machine learning can be used, and infill criteria and infill optimizers are easily exchangeable. We empirically demonstrate that mlrMBO provides state-of-the-art performance by comparing it on different benchmark scenarios against a wide range of other optimizers, including DiceOptim, rBayesianOptimization, SPOT, SMAC, Spearmint, and Hyperopt.
We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.
Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales. LD is based on the first-order Langevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD). Adopting the techniques of Tzen, Liang and Raginsky (2018) for LD to non-reversible diffusions, we show that for a given local minimum that is within an arbitrary distance from the initialization, with high probability, either the ULD trajectory ends up somewhere outside a small neighborhood of this local minimum within a recurrence time which depends on the smallest eigenvalue of the Hessian at the local minimum or they enter this neighborhood by the recurrence time and stay there for a potentially exponentially long escape time. The ULD algorithms improve upon the recurrence time obtained for LD in Tzen, Liang and Raginsky (2018) with respect to the dependency on the smallest eigenvalue of the Hessian at the local minimum. Similar result and improvement are obtained for the NLD algorithm. We also show that non-reversible variants can exit the basin of attraction of a local minimum faster in discrete time when the objective has two local minima separated by a saddle point and quantify the amount of improvement. Our analysis suggests that non-reversible Langevin algorithms are more efficient to locate a local minimum as well as exploring the state space. Our analysis is based on the quadratic approximation of the objective around a local minimum. As a by-product of our analysis, we obtain optimal mixing rates for quadratic objectives in the 2-Wasserstein distance for two non-reversible Langevin algorithms we consider.
In this work, we study the problem of global optimization in univariate loss functions, where we analyze the regret of the popular lower bounding algorithms (e.g., Piyavskii-Shubert algorithm). For any given time $T$, instead of the widely available simple regret (which is the difference of the losses between the best estimation up to $T$ and the global optimizer), we study the cumulative regret up to that time. With a suitable lower bounding algorithm, we show that it is possible to achieve satisfactory cumulative regret bounds for different classes of functions. For Lipschitz continuous functions with the parameter $L$, we show that the cumulative regret is $O(Llog T)$. For Lipschitz smooth functions with the parameter $H$, we show that the cumulative regret is $O(H)$. We also analytically extend our results for a broader class of functions that covers both the Lipschitz continuous and smooth functions individually.
We propose a new distributed optimization algorithm for solving a class of constrained optimization problems in which (a) the objective function is separable (i.e., the sum of local objective functions of agents), (b) the optimization variables of distributed agents, which are subject to nontrivial local constraints, are coupled by global constraints, and (c) only noisy observations are available to estimate (the gradients of) local objective functions. In many practical scenarios, agents may not be willing to share their optimization variables with others. For this reason, we propose a distributed algorithm that does not require the agents to share their optimization variables with each other; instead, each agent maintains a local estimate of the global constraint functions and share the estimate only with its neighbors. These local estimates of constraint functions are updated using a consensus-type algorithm, while the local optimization variables of each agent are updated using a first-order method based on noisy estimates of gradient. We prove that, when the agents adopt the proposed algorithm, their optimization variables converge with probability 1 to an optimal point of an approximated problem based on the penalty method.