Do you want to publish a course? Click here

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Register a new user

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

82 0 0.0 ( 0 )

Download Cite

Added by Martin Tak\\'a\\v{c}

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Zheng Shi - Nicolas Loizou - Peter Richtarik

Machine Learning Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present an adaptive stochastic variance reduced method with an implicit approach for adaptivity. As a variant of SARAH, our method employs the stochastic recursive gradient yet adjusts step-size based on local geometry. We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits. Furthermore, we propose a practical, fully adaptive variant, which does not require any knowledge of local geometry and any effort of tuning the hyper-parameters. This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of stochastic functions. The numerical experiments demonstrate the algorithms strong performance compared to its classical counterparts and other state-of-the-art first-order methods.

rate research

Read More

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

79 - Jiahao Xie , Zebang Shen , Chao Zhang 2019

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

Machine Learning Optimization and Control Machine Learning

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

78 - Alexandre Defossez 2017

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems. We call this method AdaBatch and it only requires a few lines of code change compared to regular mini-batch SGD algorithms. We provide a theoretical insight to understand how this new class of algorithms is performing and show that it is equivalent to an implicit per-coordinate rescaling of the gradients, similarly to what Adagrad methods can do. In theory and in practice, this new aggregation allows to keep the same sample efficiency of SG methods while increasing the batch size. Experimentally, we also show that in the case of smooth convex optimization, our procedure can even obtain a better loss when increasing the batch size for a fixed number of samples. We then apply this new algorithm to obtain a parallelizable stochastic gradient method that is synchronous but allows speed-up on par with Hogwild! methods as convergence does not deteriorate with the increase of the batch size. The same approach can be used to make mini-batch provably efficient for variance-reduced SG methods such as SVRG.

Machine Learning Optimization and Control Machine Learning

Stochastic Gradient Methods with Block Diagonal Matrix Adaptation

73 - Jihun Yun , Aurelie C. Lozano , Eunho Yang 2019

Adaptive gradient approaches that automatically adjust the learning rate on a per-feature basis have been very popular for training deep networks. This rich class of algorithms includes Adagrad, RMSprop, Adam, and recent extensions. All these algorithms have adopted diagonal matrix adaptation, due to the prohibitive computational burden of manipulating full matrices in high-dimensions. In this paper, we show that block-diagonal matrix adaptation can be a practical and powerful solution that can effectively utilize structural characteristics of deep learning architectures, and significantly improve convergence and out-of-sample generalization. We present a general framework with block-diagonal matrix updates via coordinate grouping, which includes counterparts of the aforementioned algorithms, prove their convergence in non-convex optimization, highlighting benefits compared to diagona

Machine Learning Optimization and Control Machine Learning

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

187 - Yushu Chen , Hao Jing , Wenlai Zhao 2019

We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. We analyze the convergence properties by modeling the training process as a dynamic system, which provides a guideline to select the configurable observation factor without grid search. ARSG yields $O(1/sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.

Machine Learning Optimization and Control Machine Learning

A General Family of Stochastic Proximal Gradient Methods for Deep Learning

428 - Jihun Yun , Aurelie C. Lozano , Eunho Yang 2020

We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $ell_q$ regularization ($0 leq q leq 1$) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.

Machine Learning Optimization and Control Machine Learning

suggested questions

ما العلاقة بين الذكاء الاصطناعي وتعلم الآلة؟

1999 - 0 - - Shamra Editor was published in field ( Informatics Engineering)

التعلم الآلي

ماذا يعني التنقيب عن البيانات؟

2366 - 0 - - Ahmad Ali was published in field ( Informatics Engineering)

التعلم الآلي

ماهي وسائل التنقيب في البيانات؟

2110 - 0 - - Ahmad Ali was published in field ( Informatics Engineering)

التعلم الآلي

Log in to be able to interact and post comments

comments

Fetching comments

Fetching comments

Sign in to be able to follow your search criteria

Higher Institute for Applied Sciences and Technology

Additional details More universities

mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا

نعم | كلا