Variance-Reduced Proximal and Splitting Schemes for Monotone Stochastic Generalized Equations

97 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shisheng Cui

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shisheng Cui - Uday V. Shanbhag

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider monotone inclusion problems where the operators may be expectation-valued. A direct application of proximal and splitting schemes is complicated by resolving problems with expectation-valued maps at each step, a concern that is addressed by using sampling. Accordingly, we propose avenues for addressing uncertainty in the mapping. (i) Variance-reduced stochastic proximal point method (vr-SPP). We develop amongst the first variance-reduced stochastic proximal-point schemes that achieves deterministic rates of convergence in terms of solving proximal-point problems. In addition, it is shown that the schemes are characterized by either optimal or near-optimal oracle (or sample) complexity guarantees. Finally, the generated sequences are shown to be convergent to a solution in an almost-sure sense in both monotone and strongly monotone regimes; (ii) Variance-reduced stochastic modified forward-backward splitting scheme (vr-SMFBS). In constrained settings, we consider structured settings when the map can be decomposed into an expectation-valued map $A$ and a maximal monotone map $B$ with a tractable resolvent. Akin to (i), we show that the proposed schemes are equipped with a.s. convergence guarantees, linear (strongly monotone $A$) and $mathcal{O}(1/k)$ (monotone $A$) rates of convergence while achieving optimal oracle complexity bounds. Of these, the rate statements in monotone regimes rely on leveraging the Fitzpatrick gap function for monotone inclusions. Furthermore, the schemes rely on weaker moment requirements on noise as well as allow for weakening unbiasedness requirements on oracles in strongly monotone regimes. Preliminary numerics reflect these findings and show that the variance-reduced schemes outperform stochastic approximation schemes, stochastic splitting and proximal point schemes, and sample-average approximation approaches.

قيم البحث

166 - Yangyang Xu 2020

Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases . Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

التحسين والتحكم التعلم الآلي التحليل العددي

Variance Reduced Stochastic Proximal Algorithm for AUC Maximization

215 - Soham Dan , Dushyant Sahoo 2019

Stochastic Gradient Descent has been widely studied with classification accuracy as a performance measure. However, these stochastic algorithms cannot be directly used when non-decomposable pairwise performance measures are used such as Area under th e ROC curve (AUC) which is a common performance metric when the classes are imbalanced. There have been several algorithms proposed for optimizing AUC as a performance metric, and one of the recent being a stochastic proximal gradient algorithm (SPAM). But the downside of the stochastic methods is that they suffer from high variance leading to slower convergence. To combat this issue, several variance reduced methods have been proposed with faster convergence guarantees than vanilla stochastic gradient descent. Again, these variance reduced methods are not directly applicable when non-decomposable performance measures are used. In this paper, we develop a Variance Reduced Stochastic Proximal algorithm for AUC Maximization (textsc{VRSPAM}) and perform a theoretical analysis as well as empirical analysis to show that our algorithm converges faster than SPAM which is the previous state-of-the-art for the AUC maximization problem.

التعلم الالي التعلم الآلي

On the analysis of variance-reduced and randomized projection variants of single projection schemes for monotone stochastic variational inequality problems

92 - Shisheng Cui , Uday V. Shanbhag 2019

Classical extragradient schemes and their stochastic counterpart represent a cornerstone for resolving monotone variational inequality problems. Yet, such schemes have a per-iteration complexity of two projections onto a convex set and require two ev aluations of the map, the former of which could be relatively expensive if $X$ is a complicated set. We consider two related avenues where the per-iteration complexity is significantly reduced: (i) A stochastic projected reflected gradient method requiring a single evaluation of the map and a single projection; and (ii) A stochastic subgradient extragradient method that requires two evaluations of the map, a single projection onto $X$, and a significantly cheaper projection (onto a halfspace) computable in closed form. Under a variance-reduced framework reliant on a sample-average of the map based on an increasing batch-size, we prove almost sure (a.s.) convergence of the iterates to a random point in the solution set for both schemes. Additionally, both schemes display a non-asymptotic rate of $mathcal{O}(1/K)$ where $K$ denotes the number of iterations; notably, both rates match those obtained in deterministic regimes. To address feasibility sets given by the intersection of a large number of convex constraints, we adapt both of the aforementioned schemes to a random projection framework. We then show that the random projection analogs of both schemes also display a.s. convergence under a weak-sharpness requirement; furthermore, without imposing the weak-sharpness requirement, both schemes are characterized by a provable rate of $mathcal{O}(1/sqrt{K})$ in terms of the gap function of the projection of the averaged sequence onto $X$ as well as the infeasibility of this sequence. Preliminary numerics support theoretical findings and the schemes outperform standard extragradient schemes in terms of the per-iteration complexity.

التحسين والتحكم

Randomized Stochastic Variance-Reduced Methods for Multi-Task Stochastic Bilevel Optimization

386 - Zhishuai Guo , Quanqi Hu , Lijun Zhang 2021

In this paper, we consider non-convex stochastic bilevel optimization (SBO) problems that have many applications in machine learning. Although numerous studies have proposed stochastic algorithms for solving these problems, they are limited in two pe rspectives: (i) their sample complexities are high, which do not match the state-of-the-art result for non-convex stochastic optimization; (ii) their algorithms are tailored to problems with only one lower-level problem. When there are many lower-level problems, it could be prohibitive to process all these lower-level problems at each iteration. To address these limitations, this paper proposes fast randomized stochastic algorithms for non-convex SBO problems. First, we present a stochastic method for non-convex SBO with only one lower problem and establish its sample complexity of $O(1/epsilon^3)$ for finding an $epsilon$-stationary point under Lipschitz continuous conditions of stochastic oracles, matching the lower bound for stochastic smooth non-convex optimization. Second, we present a randomized stochastic method for non-convex SBO with $m>1$ lower level problems (multi-task SBO) by processing a constant number of lower problems at each iteration, and establish its sample complexity no worse than $O(m/epsilon^3)$, which could be a better complexity than that of simply processing all $m$ lower problems at each iteration. Lastly, we establish even faster convergence results for gradient-dominant functions. To the best of our knowledge, this is the first work considering multi-task SBO and developing state-of-the-art sample complexity results.

التحسين والتحكم التعلم الآلي

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient

249 - Tianyi Lin , Chenyou Fan , Mengdi Wang 2018

Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming. Existing algorithms suffer from unsatisfactory sampl e complexity and practical issues since they ignore the convexity structure in the algorithmic design. In this paper, we develop a new stochastic compositional variance-reduced gradient algorithm with the sample complexity of $O((m+n)log(1/epsilon)+1/epsilon^3)$ where $m+n$ is the total number of samples. Our algorithm is near-optimal as the dependence on $m+n$ is optimal up to a logarithmic factor. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the new algorithm.

التحسين والتحكم التعلم الآلي