Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation Maximization for Nonconvex Finite-Sum Optimization

117 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gersende Fort

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Gersende Fort - Hoi-To Wai

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Expectation Maximization (EM) algorithm is a key reference for inference in latent variable models; unfortunately, its computational cost is prohibitive in the large scale learning setting. In this paper, we propose an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derive complexity bounds for this novel algorithm, designed to solve smooth nonconvex finite-sum optimization problems. We show that it reaches the same state of the art complexity bounds as SPIDER-EM; and provide conditions for a linear rate of convergence. Numerical results support our findings.

قيم البحث

215 - Soham Dan , Dushyant Sahoo 2019

Stochastic Gradient Descent has been widely studied with classification accuracy as a performance measure. However, these stochastic algorithms cannot be directly used when non-decomposable pairwise performance measures are used such as Area under th e ROC curve (AUC) which is a common performance metric when the classes are imbalanced. There have been several algorithms proposed for optimizing AUC as a performance metric, and one of the recent being a stochastic proximal gradient algorithm (SPAM). But the downside of the stochastic methods is that they suffer from high variance leading to slower convergence. To combat this issue, several variance reduced methods have been proposed with faster convergence guarantees than vanilla stochastic gradient descent. Again, these variance reduced methods are not directly applicable when non-decomposable performance measures are used. In this paper, we develop a Variance Reduced Stochastic Proximal algorithm for AUC Maximization (textsc{VRSPAM}) and perform a theoretical analysis as well as empirical analysis to show that our algorithm converges faster than SPAM which is the previous state-of-the-art for the AUC maximization problem.

التعلم الالي التعلم الآلي

Estimate Sequences for Variance-Reduced Stochastic Composite Optimization

169 - Andrei Kulunchakov 2019

In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov.

التعلم الالي التعلم الآلي التحسين والتحكم

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

166 - Yangyang Xu 2020

Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases . Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

التحسين والتحكم التعلم الآلي التحليل العددي

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

83 - Gersende Fort , E. Moulines (CMAP 2020

Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {em Stochastic Approximation within EM} framework. Then, we prov ide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations $kmax$. We propose two strategies for achieving an $epsilon$-approximate stationary point, respectively with $kmax = O(n^{2/3}/epsilon)$ and $kmax = O(sqrt{n}/epsilon^{3/2})$, both strategies relying on a random termination rule before $kmax$ and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow $kmax$ to scale as $sqrt{n}$ which is better than $n^{2/3}$ which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance $epsilon$, thus making this control relevant for small to medium accuracy with respect to the number of examples $n$. Second, for the $n^{2/3}$-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results desig a less conservative choice of the step size and provide a better control of the convergence in expectation.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

On the Global Convergence of (Fast) Incremental Expectation Maximization Methods

364 - Belhal Karimi , Hoi-To Wai , Eric Moulines 2019

The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the algorithm. To al leviate this problem, Neal and Hinton have proposed an incremental version of the EM (iEM) in which at each iteration the conditional expectation of the latent data (E-step) is updated only for a mini-batch of observations. Another approach has been proposed by Cappe and Moulines in which the E-step is replaced by a stochastic approximation step, closely related to stochastic gradient. In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of Chen et. al. in a common unifying framework. We also introduce a new version incremental version, inspired by the SAGA algorithm by Defazio et. al. We establish non-asymptotic convergence bounds for global convergence. Numerical applications are presented in this article to illustrate our findings.

التعلم الالي التعلم الآلي المنهجية