Linearly-Convergent FISTA Variant for Composite Optimization with Duality

68 0 0.0 ( 0 )

Download Cite

Added by Case Garner

Publication date 2021

fields

and research's language is English

Authors Casey Garner - Shuzhong Zhang

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Many large-scale optimization problems can be expressed as composite optimization models. Accelerated first-order methods such as the fast iterative shrinkage-thresholding algorithm (FISTA) have proven effective for numerous large composite models. In this paper, we present a new variation of FISTA, to be called C-FISTA, which obtains global linear convergence for a broader class of composite models than many of the latest FISTA variants. We demonstrate the versatility and effectiveness of C-FISTA by showing C-FISTA outperforms current first-order solvers on both group Lasso and group logistic regression models. Furthermore, we utilize Fenchel duality to prove C-FISTA provides global linear convergence for a large class of convex models without the loss of global linear convergence.

rate research

Average Curvature FISTA for Nonconvex Smooth Composite Optimization Problems

70 - Jiaming Liang , Renato D. C. Monteiro 2021

A previous authors paper introduces an accelerated composite gradient (ACG) variant, namely AC-ACG, for solving nonconvex smooth composite optimization (N-SCO) problems. In contrast to other ACG variants, AC-ACG estimates the local upper curvature of the N-SCO problem by using the average of the observed upper-Lipschitz curvatures obtained during the previous iterations, and uses this estimation and two composite resolvent evaluations to compute the next iterate. This paper presents an alternative FISTA-type ACG variant, namely AC-FISTA, which has the following additional features: i) it performs an average of one composite resolvent evaluation per iteration; and ii) it estimates the local upper curvature by using the average of the previously observed upper (instead of upper-Lipschitz) curvatures. These two properties acting together yield a practical AC-FISTA variant which substantially outperforms earlier ACG variants, including the AC-ACG variants discussed in the aforementioned authors paper.

Optimization and Control

A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization

84 - Sulaiman A. Alghunaim , Kun Yuan , Ali H. Sayed 2019

Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based proximal decentralized methods are known to converge to the optimal solution with sublinear rates, and it remains unclear whether this family of methods can achieve global linear convergence. To tackle this problem, this work assumes the non-smooth regularization term is common across all networked agents, which is the case for many machine learning problems. Under this condition, we design a proximal gradient decentralized algorithm whose fixed point coincides with the desired minimizer. We then provide a concise proof that establishes its linear convergence. In the absence of the non-smooth term, our analysis technique covers the well known EXTRA algorithm and provides useful bounds on the convergence rate and step-size.

Optimization and Control Distributed Parallel and Cluster Computing

A Majorized ADMM with Indefinite Proximal Terms for Linearly Constrained Convex Composite Optimization

132 - Min Li , Defeng Sun , Kim-Chuan Toh 2014

This paper presents a majorized alternating direction method of multipliers (ADMM) with indefinite proximal terms for solving linearly constrained $2$-block convex composite optimization problems with each block in the objective being the sum of a non-smooth convex function and a smooth convex function, i.e., $min_{x in {cal X}, ; y in {cal Y}}{p(x)+f(x) + q(y)+g(y)mid A^* x+B^* y = c}$. By choosing the indefinite proximal terms properly, we establish the global convergence and $O(1/k)$ ergodic iteration-complexity of the proposed method for the step-length $tau in (0, (1+sqrt{5})/2)$. The computational benefit of using indefinite proximal terms within the ADMM framework instead of the current requirement of positive semidefinite ones is also demonstrated numerically. This opens up a new way to improve the practical performance of the ADMM and related methods.

Optimization and Control

Linearly Convergent First-Order Algorithms for Semi-definite Programming

120 - Cong D. Dang , Guanghui Lan 2013

In this paper, we consider two formulations for Linear Matrix Inequalities (LMIs) under Slater type constraint qualification assumption, namely, SDP smooth and non-smooth formulations. We also propose two first-order linearly convergent algorithms for solving these formulations. Moreover, we introduce a bundle-level method which converges linearly uniformly for both smooth and non-smooth problems and does not require any smoothness information. The convergence properties of these algorithms are also discussed. Finally, we consider a special case of LMIs, linear system of inequalities, and show that a linearly convergent algorithm can be obtained under a weaker assumption.

Optimization and Control

Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization

133 - Aleksandr Beznosikov , Eduard Gorbunov , Alexander Gasnikov 2019

In this paper, we propose a new method based on the Sliding Algorithm from Lan(2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part. To the best of our knowledge, this is the first method in the literature that uses such a mixed oracle for the composite optimization. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space or, in some cases, its squared logarithm. We apply this method for the decentralized distributed optimization and derive upper bounds for the number of communication rounds for this method that matches known lower bounds. Moreover, our bound for the number of zeroth-order oracle calls per node matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space or, in some cases, even its squared logarithm.

Optimization and Control