No Arabic abstract
Ill-posed linear inverse problems appear in many scientific setups, and are typically addressed by solving optimization problems, which are composed of data fidelity and prior terms. Recently, several works have considered a back-projection (BP) based fidelity term as an alternative to the common least squares (LS), and demonstrated excellent results for popular inverse problems. These works have also empirically shown that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms. In this paper, we examine the convergence rate of the projected gradient descent (PGD) algorithm for the BP objective. Our analysis allows to identify an inherent source for its faster convergence compared to using the LS objective, while making only mild assumptions. We also analyze the more general proximal gradient method under a relaxed contraction condition on the proximal mapping of the prior. This analysis further highlights the advantage of BP when the linear measurement operator is badly conditioned. Numerical experiments with both $ell_1$-norm and GAN-based priors corroborate our theoretical results.
Communication has been seen as a significant bottleneck in industrial applications over large-scale networks. To alleviate the communication burden, sign-based optimization algorithms have gained popularity recently in both industrial and academic communities, which is shown to be closely related to adaptive gradient methods, such as Adam. Along this line, this paper investigates faster convergence for a variant of sign-based gradient descent, called scaled signGD, in three cases: 1) the objective function is strongly convex; 2) the objective function is nonconvex but satisfies the Polyak-Lojasiewicz (PL) inequality; 3) the gradient is stochastic, called scaled signGD in this case. For the first two cases, it can be shown that the scaled signGD converges at a linear rate. For case 3), the algorithm is shown to converge linearly to a neighborhood of the optimal value when a constant learning rate is employed, and the algorithm converges at a rate of $O(1/k)$ when using a diminishing learning rate, where $k$ is the iteration number. The results are also extended to the distributed setting by majority vote in a parameter-server framework. Finally, numerical experiments on logistic regression are performed to corroborate the theoretical findings.
We establish a convergence theorem for a certain type of stochastic gradient descent, which leads to a convergent variant of the back-propagation algorithm
In this paper, we investigate the non-asymptotic stationary convergence behavior of Stochastic Mirror Descent (SMD) for nonconvex optimization. We focus on a general class of nonconvex nonsmooth stochastic optimization problems, in which the objective can be decomposed into a relatively weakly convex function (possibly non-Lipschitz) and a simple non-smooth convex regularizer. We prove that SMD, without the use of mini-batch, is guaranteed to converge to a stationary point in a convergence rate of $ mathcal{O}(1/sqrt{t}) $. The efficiency estimate matches with existing results for stochastic subgradient method, but is evaluated under a stronger stationarity measure. Our convergence analysis applies to both the original SMD and its proximal version, as well as the deterministic variants, for solving relatively weakly convex problems.
A dynamical system is defined in terms of the gradient of a payoff function. Dynamical variables are of two types, ascent and descent. The ascent variables move in the direction of the gradient, while the descent variables move in the opposite direction. Dynamical systems of this form or very similar forms have been studied in diverse fields such as game theory, optimization, neural networks, and population biology. Gradient descent-ascent is approximated as a Newtonian dynamical system that conserves total energy, defined as the sum of the kinetic energy and a potential energy that is proportional to the payoff function. The error of the approximation is a residual force that violates energy conservation. If the residual force is purely dissipative, then the energy serves as a Lyapunov function, and convergence of bounded trajectories to steady states is guaranteed. A previous convergence theorem due to Kose and Uzawa required the payoff function to be convex in the descent variables, and concave in the ascent variables. Here the assumption is relaxed, so that the payoff function need only be globally `less convex or `more concave in the ascent variables than in the descent variables. Such relative convexity conditions allow the existence of multiple steady states, unlike the convex-concave assumption. When combined with sufficient conditions that imply the existence of a minimax equilibrium, boundedness of trajectories is also assured.
In this work, we analyze the global convergence property of coordinate gradient descent with random choice of coordinates and stepsizes for non-convex optimization problems. Under generic assumptions, we prove that the algorithm iterate will almost surely escape strict saddle points of the objective function. As a result, the algorithm is guaranteed to converge to local minima if all saddle points are strict. Our proof is based on viewing coordinate descent algorithm as a nonlinear random dynamical system and a quantitative finite block analysis of its linearization around saddle points.