An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method

269 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Li Shen

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Li Shen - Peng Sun - Yitong Wang

التحسين والتحكم التعلم الآلي التحليل العددي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a novel algorithmic framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-gradient (VMOR-HPE) method with a global convergence guarantee for the maximal monotone operator inclusion problem. Its iteration complexities and local linear convergence rate are provided, which theoretically demonstrate that a large over-relaxed step-size contributes to accelerating the proposed VMOR-HPE as a byproduct. Specifically, we find that a large class of primal and primal-dual operator splitting algorithms are all special cases of VMOR-HPE. Hence, the proposed framework offers a new insight into these operator splitting algorithms. In addition, we apply VMOR-HPE to the Karush-Kuhn-Tucker (KKT) generalized equation of linear equality constrained multi-block composite convex optimization, yielding a new algorithm, namely nonsymmetric Proximal Alternating Direction Method of Multipliers with a preconditioned Extra-gradient step in which the preconditioned metric is generated by a blockwise Barzilai-Borwein line search technique (PADMM-EBB). We also establish iteration complexities of PADMM-EBB in terms of the KKT residual. Finally, we apply PADMM-EBB to handle the nonnegative dual graph regularized low-rank representation problem. Promising results on synthetic and real datasets corroborate the efficacy of PADMM-EBB.

قيم البحث

166 - Yangyang Xu 2020

Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases . Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

التحسين والتحكم التعلم الآلي التحليل العددي

Orthant Based Proximal Stochastic Gradient Method for $ell_1$-Regularized Optimization

240 - Tianyi Chen , Tianyu Ding , Bo Ji 2020

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method ( OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

التحسين والتحكم التعلم الآلي التعلم الالي

An inexact Bregman proximal gradient method and its inertial variants

86 - Lei Yang , Kim-Chuan Toh 2021

In this paper, we develop an inexact Bregman proximal gradient (iBPG) method based on a novel two-point inexact stopping condition, and establish the iteration complexity of $mathcal{O}(1/k)$ as well as the convergence of the sequence under some prop er conditions. To improve the convergence speed, we further develop an inertial variant of our iBPG (denoted by v-iBPG) and show that it has the iteration complexity of $mathcal{O}(1/k^{gamma})$, where $gammageq1$ is a restricted relative smoothness exponent. Thus, when $gamma>1$, the v-iBPG readily improves the $mathcal{O}(1/k)$ convergence rate of the iBPG. In addition, for the case of using the squared Euclidean distance as the kernel function, we further develop a new inexact accelerated proximal gradient (iAPG) method, which can circumvent the underlying feasibility difficulty often appearing in existing inexact conditions and inherit all desirable convergence properties of the exact APG under proper summable-error conditions. Finally, we conduct some preliminary numerical experiments for solving a relaxation of the quadratic assignment problem to demonstrate the convergence behaviors of the iBPG, v-iBPG and iAPG under different inexactness settings.

التحسين والتحكم

Metric Subregularity and the Proximal Point Method

126 - D. Leventhal 2009

We examine the linear convergence rates of variants of the proximal point method for finding zeros of maximal monotone operators. We begin by showing how metric subregularity is sufficient for linear convergence to a zero of a maximal monotone operat or. This result is then generalized to obtain convergence rates for the problem of finding a common zero of multiple monotone operators by considering randomized and averaged proximal methods.

التحسين والتحكم التحليل العددي

Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold

87 - Bokun Wang , Shiqian Ma , Lingzhou Xue 2020

Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an $epsilon$-stationary point with $mathcal{O}(epsilon^{-3})$ IFOs in the online case, and $mathcal{O}(n+sqrt{n}epsilon^{-3})$ IFOs in the finite-sum case with $n$ being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that uses Riemannian subgradient information.

التحسين والتحكم التعلم الآلي التعلم الالي