Do you want to publish a course? Click here

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

133   0   0.0 ( 0 )
 Added by Yi Zhou
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Various types of parameter restart schemes have been proposed for accelerated gradient algorithms to facilitate their practical convergence in convex optimization. However, the convergence properties of accelerated gradient algorithms under parameter restart remain obscure in nonconvex optimization. In this paper, we propose a novel accelerated proximal gradient algorithm with parameter restart (named APG-restart) for solving nonconvex and nonsmooth problems. Our APG-restart is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization. Numerical experiments demonstrate the effectiveness of our proposed algorithm.



rate research

Read More

166 - Yangyang Xu 2020
Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.
Stochastic gradient descent (SGD) is one of the most widely used optimization methods for parallel and distributed processing of large datasets. One of the key limitations of distributed SGD is the need to regularly communicate the gradients between different computation nodes. To reduce this communication bottleneck, recent work has considered a one-bit variant of SGD, where only the sign of each gradient element is used in optimization. In this paper, we extend this idea by proposing a stochastic variant of the proximal-gradient method that also uses one-bit per update element. We prove the theoretical convergence of the method for non-convex optimization under a set of explicit assumptions. Our results indicate that the compressed method can match the convergence rate of the uncompressed one, making the proposed method potentially appealing for distributed processing of large datasets.
In this paper, we introduce a proximal-proximal majorization-minimization (PPMM) algorithm for nonconvex tuning-free robust regression problems. The basic idea is to apply the proximal majorization-minimization algorithm to solve the nonconvex problem with the inner subproblems solved by a sparse semismooth Newton (SSN) method based proximal point algorithm (PPA). We must emphasize that the main difficulty in the design of the algorithm lies in how to overcome the singular difficulty of the inner subproblem. Furthermore, we also prove that the PPMM algorithm converges to a d-stationary point. Due to the Kurdyka-Lojasiewicz (KL) property of the problem, we present the convergence rate of the PPMM algorithm. Numerical experiments demonstrate that our proposed algorithm outperforms the existing state-of-the-art algorithms.
In this paper, we consider an accelerated method for solving nonconvex and nonsmooth minimization problems. We propose a Bregman Proximal Gradient algorithm with extrapolation(BPGe). This algorithm extends and accelerates the Bregman Proximal Gradient algorithm (BPG), which circumvents the restrictive global Lipschitz gradient continuity assumption needed in Proximal Gradient algorithms (PG). The BPGe algorithm has higher generality than the recently introduced Proximal Gradient algorithm with extrapolation(PGe), and besides, due to the extrapolation step, BPGe converges faster than BPG algorithm. Analyzing the convergence, we prove that any limit point of the sequence generated by BPGe is a stationary point of the problem by choosing parameters properly. Besides, assuming Kurdyka-{L}ojasiewicz property, we prove the whole sequences generated by BPGe converges to a stationary point. Finally, to illustrate the potential of the new method BPGe, we apply it to two important practical problems that arise in many fundamental applications (and that not satisfy global Lipschitz gradient continuity assumption): Poisson linear inverse problems and quadratic inverse problems. In the tests the accelerated BPGe algorithm shows faster convergence results, giving an interesting new algorithm.
Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based proximal decentralized methods are known to converge to the optimal solution with sublinear rates, and it remains unclear whether this family of methods can achieve global linear convergence. To tackle this problem, this work assumes the non-smooth regularization term is common across all networked agents, which is the case for many machine learning problems. Under this condition, we design a proximal gradient decentralized algorithm whose fixed point coincides with the desired minimizer. We then provide a concise proof that establishes its linear convergence. In the absence of the non-smooth term, our analysis technique covers the well known EXTRA algorithm and provides useful bounds on the convergence rate and step-size.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا