A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization

48 0 0.0 ( 0 )

Download Cite

Added by Albert Berahas

Publication date 2019

fields

and research's language is English

Authors Albert S. Berahas - Liyuan Cao - Krzysztof Choromanski

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we analyze several methods for approximating gradients of noisy functions using only function values. These methods include finite differences, linear interpolation, Gaussian smoothing and smoothing on a sphere. The methods differ in the number of functions sampled, the choice of the sample points, and the way in which the gradient approximations are derived. For each method, we derive bounds on the number of samples and the sampling radius which guarantee favorable convergence properties for a line search or fixed step size descent method. To this end, we use the results in [Berahas et al., 2019] and show how each method can satisfy the sufficient conditions, possibly only with some sufficiently large probability at each iteration, as happens to be the case with Gaussian smoothing and smoothing on a sphere. Finally, we present numerical results evaluating the quality of the gradient approximations as well as their performance in conjunction with a line search derivative-free optimization algorithm.

rate research

On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

66 - Hao-Jun Michael Shi , Melody Qiming Xuan , Figen Oztoprak 2021

The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calculated by finite differences, with a differencing interval determined by the noise level in the functions and a bound on the second or third derivatives. It is assumed that noise level is known or can be estimated by means of difference tables or sampling. The use of finite differences has been largely dismissed in the derivative-free optimization literature as too expensive in terms of function evaluations and/or as impractical when the objective function contains noise. The test results presented in this paper suggest that such views should be re-examined and that the finite-difference approach has much to be recommended. The tests compared NEWUOA, DFO-LS and COBYLA against the finite-difference approach on three classes of problems: general unconstrained problems, nonlinear least squares, and general nonlinear programs with equality constraints.

Optimization and Control

Full-low evaluation methods for derivative-free optimization

127 - Albert S. Berahas , Oumaima Sohab , Luis Nunes Vicente 2021

We propose a new class of rigorous methods for derivative-free optimization with the aim of delivering efficient and robust numerical performance for functions of all types, from smooth to non-smooth, and under different noise regimes. To this end, we have developed Full-Low Evaluation methods, organized around two main types of iterations. The first iteration type is expensive in function evaluations, but exhibits good performance in the smooth and non-noisy cases. For the theory, we consider a line search based on an approximate gradient, backtracking until a sufficient decrease condition is satisfied. In practice, the gradient was approximated via finite differences, and the direction was calculated by a quasi-Newton step (BFGS). The second iteration type is cheap in function evaluations, yet more robust in the presence of noise or non-smoothness. For the theory, we consider direct search, and in practice we use probabilistic direct search with one random direction and its negative. A switch condition from Full-Eval to Low-Eval iterations was developed based on the values of the line-search and direct-search stepsizes. If enough Full-Eval steps are taken, we derive a complexity result of gradient-descent type. Under failure of Full-Eval, the Low-Eval iterations become the drivers of convergence yielding non-smooth convergence. Full-Low Evaluation methods are shown to be efficient and robust in practice across problems with different levels of smoothness and noise.

Optimization and Control

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods

87 - Albert S. Berahas , Richard H. Byrd , Jorge Nocedal 2018

This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval $h$ based on the noise estimation techniques of Hamming (2012) and More and Wild (2011). This noise estimation procedure and the selection of $h$ are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a function interpolating trust region method are presented.

Optimization and Control

Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization

133 - Aleksandr Beznosikov , Eduard Gorbunov , Alexander Gasnikov 2019

In this paper, we propose a new method based on the Sliding Algorithm from Lan(2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part. To the best of our knowledge, this is the first method in the literature that uses such a mixed oracle for the composite optimization. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space or, in some cases, its squared logarithm. We apply this method for the decentralized distributed optimization and derive upper bounds for the number of communication rounds for this method that matches known lower bounds. Moreover, our bound for the number of zeroth-order oracle calls per node matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space or, in some cases, even its squared logarithm.

Optimization and Control

Optimization by moving ridge functions: Derivative-free optimization for computationally intensive functions

58 - James C. Gross , Geoffrey T. Parks 2020

A novel derivative-free algorithm, optimization by moving ridge functions (OMoRF), for unconstrained and bound-constrained optimization is presented. This algorithm couples trust region methodologies with output-based dimension reduction to accelerate convergence of model-based optimization strategies. The dimension-reducing subspace is updated as the trust region moves through the function domain, allowing OMoRF to be applied to functions with no known global low-dimensional structure. Furthermore, its low computational requirement allows it to make rapid progress when optimizing high-dimensional functions. Its performance is examined on a set of test problems of moderate to high dimension and a high-dimensional design optimization problem. The results show that OMoRF compares favourably to other common derivative-free optimization methods, even for functions in which no underlying global low-dimensional structure is known.

Optimization and Control