ترغب بنشر مسار تعليمي؟ اضغط هنا

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

82 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Martin Tak\\'a\\v{c}

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zheng Shi - Nicolas Loizou - Peter Richtarik

التعلم الآلي التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present an adaptive stochastic variance reduced method with an implicit approach for adaptivity. As a variant of SARAH, our method employs the stochastic recursive gradient yet adjusts step-size based on local geometry. We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits. Furthermore, we propose a practical, fully adaptive variant, which does not require any knowledge of local geometry and any effort of tuning the hyper-parameters. This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of stochastic functions. The numerical experiments demonstrate the algorithms strong performance compared to its classical counterparts and other state-of-the-art first-order methods.

قيم البحث

اقرأ أيضاً

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

79 - Jiahao Xie , Zebang Shen , Chao Zhang 2019

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

التعلم الآلي التحسين والتحكم التعلم الالي

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

78 - Alexandre Defossez 2017

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems. We call this method AdaBatch and it only requires a few li nes of code change compared to regular mini-batch SGD algorithms. We provide a theoretical insight to understand how this new class of algorithms is performing and show that it is equivalent to an implicit per-coordinate rescaling of the gradients, similarly to what Adagrad methods can do. In theory and in practice, this new aggregation allows to keep the same sample efficiency of SG methods while increasing the batch size. Experimentally, we also show that in the case of smooth convex optimization, our procedure can even obtain a better loss when increasing the batch size for a fixed number of samples. We then apply this new algorithm to obtain a parallelizable stochastic gradient method that is synchronous but allows speed-up on par with Hogwild! methods as convergence does not deteriorate with the increase of the batch size. The same approach can be used to make mini-batch provably efficient for variance-reduced SG methods such as SVRG.

التعلم الآلي التحسين والتحكم التعلم الالي

Stochastic Gradient Methods with Block Diagonal Matrix Adaptation

73 - Jihun Yun , Aurelie C. Lozano , Eunho Yang 2019

Adaptive gradient approaches that automatically adjust the learning rate on a per-feature basis have been very popular for training deep networks. This rich class of algorithms includes Adagrad, RMSprop, Adam, and recent extensions. All these algorit hms have adopted diagonal matrix adaptation, due to the prohibitive computational burden of manipulating full matrices in high-dimensions. In this paper, we show that block-diagonal matrix adaptation can be a practical and powerful solution that can effectively utilize structural characteristics of deep learning architectures, and significantly improve convergence and out-of-sample generalization. We present a general framework with block-diagonal matrix updates via coordinate grouping, which includes counterparts of the aforementioned algorithms, prove their convergence in non-convex optimization, highlighting benefits compared to diagona

التعلم الآلي التحسين والتحكم التعلم الالي

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

187 - Yushu Chen , Hao Jing , Wenlai Zhao 2019

We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is fur ther combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. We analyze the convergence properties by modeling the training process as a dynamic system, which provides a guideline to select the configurable observation factor without grid search. ARSG yields $O(1/sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.

التعلم الآلي التحسين والتحكم التعلم الالي

A General Family of Stochastic Proximal Gradient Methods for Deep Learning

428 - Jihun Yun , Aurelie C. Lozano , Eunho Yang 2020

We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditi oners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $ell_q$ regularization ($0 leq q leq 1$) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.

التعلم الآلي التحسين والتحكم التعلم الالي

الأسئلة المقترحة

ما العلاقة بين الذكاء الاصطناعي وتعلم الآلة؟

1991 - 0 - - Shamra Editor تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

ماذا يعني التنقيب عن البيانات؟

2362 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

ماهي وسائل التنقيب في البيانات؟

2104 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة السورية الخاصة

تفاصيل إضافية المزيد من الجامعات

mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا

نعم | كلا