No Arabic abstract
Superiorization reduces, not necessarily minimizes, the value of a target function while seeking constraints-compatibility. This is done by taking a solely feasibility-seeking algorithm, analyzing its perturbations resilience, and proactively perturbing its iterates accordingly to steer them toward a feasible point with reduced value of the target function. When the perturbation steps are computationally efficient, this enables generation of a superior result with essentially the same computational cost as that of the original feasibility-seeking algorithm. In this work, we refine previous formulations of the superiorization method to create a more general framework, enabling target function reduction steps that do not require partial derivatives of the target function. In perturbations that use partial derivatives the step-sizes in the perturbation phase of the superiorization method are chosen independently from the choice of the nonascent directions. This is no longer true when component-wise perturbations are employed. In that case, the step-sizes must be linked to the choice of the nonascent direction in every step. Besides presenting and validating these notions, we give a computational demonstration of superiorization with component-wise perturbations for a problem of computerized tomography image reconstruction.
The superiorization methodology is intended to work with input data of constrained minimization problems, that is, a target function and a set of constraints. However, it is based on an antipodal way of thinking to what leads to constrained minimization methods. Instead of adapting unconstrained minimization algorithms to handling constraints, it adapts feasibility-seeking algorithms to reduce (not necessarily minimize) target function values. This is done by inserting target-function-reducing perturbations into a feasibility-seeking algorithm while retaining its feasibility-seeking ability and without paying a high computational price. A superiorized algorithm that employs component-wise target function reduction steps is presented. This enables derivative-free superiorization (DFS), meaning that superiorization can be applied to target functions that have no calculable partial derivatives or subgradients. The numerical behavior of our derivative-free superiorization algorithm is illustrated on a data set generated by simulating a problem of image reconstruction from projections. We present a tool (we call it a proximity-target curve) for deciding which of two iterative methods is better for solving a particular problem. The plots of proximity-target curves of our experiments demonstrate the advantage of the proposed derivative-free superiorization algorithm.
In this paper, we propose a new method based on the Sliding Algorithm from Lan(2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part. To the best of our knowledge, this is the first method in the literature that uses such a mixed oracle for the composite optimization. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space or, in some cases, its squared logarithm. We apply this method for the decentralized distributed optimization and derive upper bounds for the number of communication rounds for this method that matches known lower bounds. Moreover, our bound for the number of zeroth-order oracle calls per node matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space or, in some cases, even its squared logarithm.
We propose a new class of rigorous methods for derivative-free optimization with the aim of delivering efficient and robust numerical performance for functions of all types, from smooth to non-smooth, and under different noise regimes. To this end, we have developed Full-Low Evaluation methods, organized around two main types of iterations. The first iteration type is expensive in function evaluations, but exhibits good performance in the smooth and non-noisy cases. For the theory, we consider a line search based on an approximate gradient, backtracking until a sufficient decrease condition is satisfied. In practice, the gradient was approximated via finite differences, and the direction was calculated by a quasi-Newton step (BFGS). The second iteration type is cheap in function evaluations, yet more robust in the presence of noise or non-smoothness. For the theory, we consider direct search, and in practice we use probabilistic direct search with one random direction and its negative. A switch condition from Full-Eval to Low-Eval iterations was developed based on the values of the line-search and direct-search stepsizes. If enough Full-Eval steps are taken, we derive a complexity result of gradient-descent type. Under failure of Full-Eval, the Low-Eval iterations become the drivers of convergence yielding non-smooth convergence. Full-Low Evaluation methods are shown to be efficient and robust in practice across problems with different levels of smoothness and noise.
This paper addresses a distributed optimization problem in a communication network where nodes are active sporadically. Each active node applies some learning method to control its action to maximize the global utility function, which is defined as the sum of the local utility functions of active nodes. We deal with stochastic optimization problem with the setting that utility functions are disturbed by some non-additive stochastic process. We consider a more challenging situation where the learning method has to be performed only based on a scalar approximation of the utility function, rather than its closed-form expression, so that the typical gradient descent method cannot be applied. This setting is quite realistic when the network is affected by some stochastic and time-varying process, and that each node cannot have the full knowledge of the network states. We propose a distributed optimization algorithm and prove its almost surely convergence to the optimum. Convergence rate is also derived with an additional assumption that the objective function is strongly concave. Numerical results are also presented to justify our claim.
This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval $h$ based on the noise estimation techniques of Hamming (2012) and More and Wild (2011). This noise estimation procedure and the selection of $h$ are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a function interpolating trust region method are presented.