We describe convergence acceleration schemes for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization method. Our analysis does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterovs accelerated method, or primal-dual methods. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeixs conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems.
We consider a generic empirical composition optimization problem, where there are empirical averages present both outside and inside nonlinear loss functions. Such a problem is of interest in various machine learning applications, and cannot be directly solved by standard methods such as stochastic gradient descent. We take a novel approach to solving this problem by reformulating the original minimization objective into an equivalent min-max objective, which brings out all the empirical averages that are originally inside the nonlinear loss functions. We exploit the rich structures of the reformulated problem and develop a stochastic primal-dual algorithm, SVRPDA-I, to solve the problem efficiently. We carry out extensive theoretical analysis of the proposed algorithm, obtaining the convergence rate, the computation complexity and the storage complexity. In particular, the algorithm is shown to converge at a linear rate when the problem is strongly convex. Moreover, we also develop an approximate version of the algorithm, named SVRPDA-II, which further reduces the memory requirement. Finally, we evaluate our proposed algorithms on several real-world benchmarks, and experimental results show that the proposed algorithms significantly outperform existing techniques.
We study the canonical quantity-based network revenue management (NRM) problem where the decision-maker must irrevocably accept or reject each arriving customer request with the goal of maximizing the total revenue given limited resources. The exact solution to the problem by dynamic programming is computationally intractable due to the well-known curse of dimensionality. Existing works in the literature make use of the solution to the deterministic linear program (DLP) to design asymptotically optimal algorithms. Those algorithms rely on repeatedly solving DLPs to achieve near-optimal regret bounds. It is, however, time-consuming to repeatedly compute the DLP solutions in real time, especially in large-scale problems that may involve hundreds of millions of demand units. In this paper, we propose innovative algorithms for the NRM problem that are easy to implement and do not require solving any DLPs. Our algorithm achieves a regret bound of $O(log k)$, where $k$ is the system size. To the best of our knowledge, this is the first NRM algorithm that (i) has an $o(sqrt{k})$ asymptotic regret bound, and (ii) does not require solving any DLPs.
We propose an extended primal-dual algorithm framework for solving a general nonconvex optimization model. This work is motivated by image reconstruction problems in a class of nonlinear imaging, where the forward operator can be formulated as a nonlinear convex function with respect to the reconstructed image. Using the proposed framework, we put forward six specific iterative schemes, and present their detailed mathematical explanation. We also establish the relationship to existing algorithms. Moreover, under proper assumptions, we analyze the convergence of the schemes for the general model when the optimal dual variable regarding the nonlinear operator is non-vanishing. As a representative, the image reconstruction for spectral computed tomography is used to demonstrate the effectiveness of the proposed algorithm framework. By special properties of the concrete problem, we further prove the convergence of these customized schemes when the optimal dual variable regarding the nonlinear operator is vanishing. Finally, the numerical experiments show that the proposed algorithm has good performance on image reconstruction for various data with non-standard scanning configuration.
The importance of an adequate inner loop starting point (as opposed to a sufficient inner loop stopping rule) is discussed in the context of a numerical optimization algorithm consisting of nested primal-dual proximal-gradient iterations. While the number of inner iterations is fixed in advance, convergence of the whole algorithm is still guaranteed by virtue of a warm-start strategy for the inner loop, showing that inner loop starting rules can be just as effective as stopping rules for guaranteeing convergence. The algorithm itself is applicable to the numerical solution of convex optimization problems defined by the sum of a differentiable term and two possibly non-differentiable terms. One of the latter terms should take the form of the composition of a linear map and a proximable function, while the differentiable term needs an accessible gradient. The algorithm reduces to the classical proximal gradient algorithm in certain special cases and it also generalizes other existing algorithms. In addition, under some conditions of strong convexity, we show a linear rate of convergence.
This paper studies the distributed optimization problem where the objective functions might be nondifferentiable and subject to heterogeneous set constraints. Unlike existing subgradient methods, we focus on the case when the exact subgradients of the local objective functions can not be accessed by the agents. To solve this problem, we propose a projected primal-dual dynamics using only the objective functions approximate subgradients. We first prove that the formulated optimization problem can only be solved with an approximate error depending upon the accuracy of the available subgradients. Then, we show the exact solvability of this optimization problem if the accumulated approximation error is not too large. After that, we also give a novel componentwise normalized variant to improve the transient behavior of the convergent sequence. The effectiveness of our algorithms is verified by a numerical example.