Do you want to publish a course? Click here

Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem

243   0   0.0 ( 0 )
 Added by Tianyi Chen
 Publication date 2020
  fields
and research's language is English




Ask ChatGPT about the research

Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, e.g., feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems, effective group sparsity exploration are typically hard to achieve. Particularly, the state-of-the-art stochastic optimization algorithms usually generate merely dense solutions. To overcome this shortage, we propose a stochastic method -- Half-space Stochastic Projected Gradient (HSPG) method to search solutions of high group sparsity while maintain the convergence. Initialized by a simple Prox-SG Step, the HSPG method relies on a novel Half-Space Step to substantially boost the sparsity level. Numerically, HSPG demonstrates its superiority in deep neural networks, e.g., VGG16, ResNet18 and MobileNetV1, by computing solutions of higher group sparsity, competitive objective values and generalization accuracy.



rate research

Read More

240 - Tianyi Chen , Tianyu Ding , Bo Ji 2020
Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.
166 - Yangyang Xu 2020
Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.
This paper is concerned with a class of zero-norm regularized piecewise linear-quadratic (PLQ) composite minimization problems, which covers the zero-norm regularized $ell_1$-loss minimization problem as a special case. For this class of nonconvex nonsmooth problems, we show that its equivalent MPEC reformulation is partially calm on the set of global optima and make use of this property to derive a family of equivalent DC surrogates. Then, we propose a proximal majorization-minimization (MM) method, a convex relaxation approach not in the DC algorithm framework, for solving one of the DC surrogates which is a semiconvex PLQ minimization problem involving three nonsmooth terms. For this method, we establish its global convergence and linear rate of convergence, and under suitable conditions show that the limit of the generated sequence is not only a local optimum but also a good critical point in a statistical sense. Numerical experiments are conducted with synthetic and real data for the proximal MM method with the subproblems solved by a dual semismooth Newton method to confirm our theoretical findings, and numerical comparisons with a convergent indefinite-proximal ADMM for the partially smoothed DC surrogate verify its superiority in the quality of solutions and computing time.
222 - Liwei Zhang , Yule Zhang , Jia Wu 2019
This paper considers the problem of minimizing a convex expectation function over a closed convex set, coupled with a set of inequality convex expectation constraints. We present a new stochastic approximation type algorithm, namely the stochastic approximation proximal method of multipliers (PMMSopt) to solve this convex stochastic optimization problem. We analyze regrets of a stochastic approximation proximal method of multipliers for solving convex stochastic optimization problems. Under mild conditions, we show that this algorithm exhibits ${rm O}(T^{-1/2})$ rate of convergence, in terms of both optimality gap and constraint violation if parameters in the algorithm are properly chosen, when the objective and constraint functions are generally convex, where $T$ denotes the number of iterations. Moreover, we show that, with at least $1-e^{-T^{1/4}}$ probability, the algorithm has no more than ${rm O}(T^{-1/4})$ objective regret and no more than ${rm O}(T^{-1/8})$ constraint violation regret. To the best of our knowledge, this is the first time that such a proximal method for solving expectation constrained stochastic optimization is presented in the literature.
Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an $epsilon$-stationary point with $mathcal{O}(epsilon^{-3})$ IFOs in the online case, and $mathcal{O}(n+sqrt{n}epsilon^{-3})$ IFOs in the finite-sum case with $n$ being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that uses Riemannian subgradient information.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا