Mini-batch stochastic Nesterovs smoothing method for constrained convex stochastic composite optimization

247 0 0.0 ( 0 )

Download Cite

Added by Ruyu Wang

Publication date 2021

fields

and research's language is English

Authors Ruyu Wang - Chao Zhang - Lichun Wang

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper considers a class of constrained convex stochastic composite optimization problems whose objective function is given by the summation of a differentiable convex component, together with a nonsmooth but convex component. The nonsmooth component has an explicit max structure that may not easy to compute its proximal mapping. In order to solve these problems, we propose a mini-batch stochastic Nesterovs smoothing (MSNS) method. Convergence and the optimal iteration complexity of the method are established. Numerical results are provided to illustrate the efficiency of the proposed MSNS method for a support vector machine (SVM) model.

rate research

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

682 - Hamid Reza Feyzmahdavian , Arda Aytekin , Mikael Johansson 2015

Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order $O(1/sqrt{T})$ for general convex regularization functions, and the rate $O(1/T)$ for strongly convex regularization functions, where $T$ is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.

Optimization and Control Systems and Control Machine Learning

Adaptive Primal-Dual Stochastic Gradient Method for Expectation-constrained Convex Stochastic Programs

107 - Yonggui Yan , Yangyang Xu 2020

Stochastic gradient methods (SGMs) have been widely used for solving stochastic optimization problems. A majority of existing works assume no constraints or easy-to-project constraints. In this paper, we consider convex stochastic optimization problems with expectation constraints. For these problems, it is often extremely expensive to perform projection onto the feasible set. Several SGMs in the literature can be applied to solve the expectation-constrained stochastic problems. We propose a novel primal-dual type SGM based on the Lagrangian function. Different from existing methods, our method incorporates an adaptiveness technique to speed up convergence. At each iteration, our method inquires an unbiased stochastic subgradient of the Lagrangian function, and then it renews the primal variables by an adaptive-SGM update and the dual variables by a vanilla-SGM update. We show that the proposed method has a convergence rate of $O(1/sqrt{k})$ in terms of the objective error and the constraint violation. Although the convergence rate is the same as those of existing SGMs, we observe its significantly faster convergence than an existing non-adaptive primal-dual SGM and a primal SGM on solving the Neyman-Pearson classification and quadratically constrained quadratic programs. Furthermore, we modify the proposed method to solve convex-concave stochastic minimax problems, for which we perform adaptive-SGM updates to both primal and dual variables. A convergence rate of $O(1/sqrt{k})$ is also established to the modified method for solving minimax problems in terms of primal-dual gap.

Optimization and Control

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

166 - Yangyang Xu 2020

Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

Optimization and Control Machine Learning Numerical Analysis

A stochastic linearized proximal method of multipliers for convex stochastic optimization with expectation constraints

235 - Liwei Zhang , Yule Zhang , Jia Wu 2021

This paper considers the problem of minimizing a convex expectation function with a set of inequality convex expectation constraints. We present a computable stochastic approximation type algorithm, namely the stochastic linearized proximal method of multipliers, to solve this convex stochastic optimization problem. This algorithm can be roughly viewed as a hybrid of stochastic approximation and the traditional proximal method of multipliers. Under mild conditions, we show that this algorithm exhibits $O(K^{-1/2})$ expected convergence rates for both objective reduction and constraint violation if parameters in the algorithm are properly chosen, where $K$ denotes the number of iterations. Moreover, we show that, with high probability, the algorithm has $O(log(K)K^{-1/2})$ constraint violation bound and $O(log^{3/2}(K)K^{-1/2})$ objective bound. Some preliminary numerical results demonstrate the performance of the proposed algorithm.

Optimization and Control Machine Learning

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

88 - Ran Xin , Usman A. Khan , Soummya Kar 2021

This paper considers decentralized stochastic optimization over a network of $n$ nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an $epsilon$-accurate first-order stationary point of the sum of the local costs. We focus on an online setting, where each node accesses its local cost only by means of a stochastic first-order oracle that returns a noisy version of the exact gradient. In this context, we propose a novel single-loop decentralized hybrid variance-reduced stochastic gradient method, called GT-HSGD, that outperforms the existing approaches in terms of both the oracle complexity and practical implementation. The GT-HSGD algorithm implements specialized local hybrid stochastic gradient estimators that are fused over the network to track the global gradient. Remarkably, GT-HSGD achieves a network topology-independent oracle complexity of $O(n^{-1}epsilon^{-3})$ when the required error tolerance $epsilon$ is small enough, leading to a linear speedup with respect to the centralized optimal online variance-reduced approaches that operate on a single node. Numerical experiments are provided to illustrate our main technical results.

Optimization and Control Distributed Parallel and Cluster Computing Machine Learning