ترغب بنشر مسار تعليمي؟ اضغط هنا

Variance Reduced Stochastic Proximal Algorithm for AUC Maximization

216   0   0.0 ( 0 )
 نشر من قبل Dushyant Sahoo
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Stochastic Gradient Descent has been widely studied with classification accuracy as a performance measure. However, these stochastic algorithms cannot be directly used when non-decomposable pairwise performance measures are used such as Area under the ROC curve (AUC) which is a common performance metric when the classes are imbalanced. There have been several algorithms proposed for optimizing AUC as a performance metric, and one of the recent being a stochastic proximal gradient algorithm (SPAM). But the downside of the stochastic methods is that they suffer from high variance leading to slower convergence. To combat this issue, several variance reduced methods have been proposed with faster convergence guarantees than vanilla stochastic gradient descent. Again, these variance reduced methods are not directly applicable when non-decomposable performance measures are used. In this paper, we develop a Variance Reduced Stochastic Proximal algorithm for AUC Maximization (textsc{VRSPAM}) and perform a theoretical analysis as well as empirical analysis to show that our algorithm converges faster than SPAM which is the previous state-of-the-art for the AUC maximization problem.



قيم البحث

اقرأ أيضاً

266 - Yunwen Lei , Yiming Ying 2019
In this paper we consider the problem of maximizing the Area under the ROC curve (AUC) which is a widely used performance metric in imbalanced classification and anomaly detection. Due to the pairwise nonlinearity of the objective function, classical SGD algorithms do not apply to the task of AUC maximization. We propose a novel stochastic proximal algorithm for AUC maximization which is scalable to large scale streaming data. Our algorithm can accommodate general penalty terms and is easy to implement with favorable $O(d)$ space and per-iteration time complexities. We establish a high-probability convergence rate $O(1/sqrt{T})$ for the general convex setting, and improve it to a fast convergence rate $O(1/T)$ for the cases of strongly convex regularizers and no regularization term (without strong convexity). Our proof does not need the uniform boundedness assumption on the loss function or the iterates which is more fidelity to the practice. Finally, we perform extensive experiments over various benchmark data sets from real-world application domains which show the superior performance of our algorithm over the existing AUC maximization algorithms.
The Expectation Maximization (EM) algorithm is a key reference for inference in latent variable models; unfortunately, its computational cost is prohibitive in the large scale learning setting. In this paper, we propose an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derive complexity bounds for this novel algorithm, designed to solve smooth nonconvex finite-sum optimization problems. We show that it reaches the same state of the art complexity bounds as SPIDER-EM; and provide conditions for a linear rate of convergence. Numerical results support our findings.
In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. This point of view covers the stochastic gradient descent method, variants of the approaches SAGA, SVRG, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we show that this SVRG variant is adaptive to strong convexity; (iii) we naturally obtain new algorithms with the same guarantees; (iv) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we show that this viewpoint is useful to obtain new accelerated algorithms in the sense of Nesterov.
We consider monotone inclusion problems where the operators may be expectation-valued. A direct application of proximal and splitting schemes is complicated by resolving problems with expectation-valued maps at each step, a concern that is addressed by using sampling. Accordingly, we propose avenues for addressing uncertainty in the mapping. (i) Variance-reduced stochastic proximal point method (vr-SPP). We develop amongst the first variance-reduced stochastic proximal-point schemes that achieves deterministic rates of convergence in terms of solving proximal-point problems. In addition, it is shown that the schemes are characterized by either optimal or near-optimal oracle (or sample) complexity guarantees. Finally, the generated sequences are shown to be convergent to a solution in an almost-sure sense in both monotone and strongly monotone regimes; (ii) Variance-reduced stochastic modified forward-backward splitting scheme (vr-SMFBS). In constrained settings, we consider structured settings when the map can be decomposed into an expectation-valued map $A$ and a maximal monotone map $B$ with a tractable resolvent. Akin to (i), we show that the proposed schemes are equipped with a.s. convergence guarantees, linear (strongly monotone $A$) and $mathcal{O}(1/k)$ (monotone $A$) rates of convergence while achieving optimal oracle complexity bounds. Of these, the rate statements in monotone regimes rely on leveraging the Fitzpatrick gap function for monotone inclusions. Furthermore, the schemes rely on weaker moment requirements on noise as well as allow for weakening unbiasedness requirements on oracles in strongly monotone regimes. Preliminary numerics reflect these findings and show that the variance-reduced schemes outperform stochastic approximation schemes, stochastic splitting and proximal point schemes, and sample-average approximation approaches.
166 - Yangyang Xu 2020
Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases . Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(varepsilon^{-3})$ to produce a stochastic $varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $Theta(varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $tilde{O}(varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا