بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

A Stochastic Gradient Descent Theorem and the Back-Propagation Algorithm

51 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hao Wu

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Hao Wu

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We establish a convergence theorem for a certain type of stochastic gradient descent, which leads to a convergent variant of the back-propagation algorithm

قيم البحث

اقرأ أيضاً

Stochastic Reweighted Gradient Descent

166 - Ayoub El Hanchi , David A. Stephens 2021

Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sampling instead of control variates. While many such methods have been proposed in the literature, directly proving that they improve the convergence of the resulting optimization algorithm has remained elusive. In this work, we propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient). We analyze the convergence of SRG in the strongly-convex case and show that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD. We pay particular attention to the time and memory overhead of our proposed method, and design a specialized red-black tree allowing its efficient implementation. Finally, we present empirical results to support our findings.

التحسين والتحكم التعلم الآلي التعلم الالي

Biased Stochastic Gradient Descent for Conditional Stochastic Optimization

148 - Yifan Hu , Siqi Zhang , Xin Chen 2020

Conditional Stochastic Optimization (CSO) covers a variety of applications ranging from meta-learning and causal inference to invariant learning. However, constructing unbiased gradient estimates in CSO is challenging due to the composition structure . As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives, under smooth and non-smooth conditions. We also provide matching lower bounds of BSGD for convex CSO objectives. Extensive numerical experiments are conducted to illustrate the performance of BSGD on robust logistic regression, model-agnostic meta-learning (MAML), and instrumental variable regression (IV).

التحسين والتحكم التعلم الآلي التعلم الالي

Convergence and Alignment of Gradient Descent with Random Back Propagation Weights

85 - Ganlin Song , Ruitu Xu , John Lafferty 2021

Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure -- updating o ne neurons synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neural networks as a tool for understanding the biological principles of information processing in the brain. Lillicrap et al. (2016) propose a more biologically plausible feedback alignment algorithm that uses random and fixed backpropagation weights, and show promising simulations. In this paper we study the mathematical properties of the feedback alignment procedure by analyzing convergence and alignment for two-layer networks under squared error loss. In the overparameterized setting, we prove that the error converges to zero exponentially fast, and also that regularization is necessary in order for the parameters to become aligned with the random backpropagation weights. Simulations are given that are consistent with this analysis and suggest further generalizations. These results contribute to our understanding of how biologically plausible algorithms might carry out weight learning in a manner different from Hebbian learning, with performance that is comparable with the full non-local backpropagation algorithm.

التعلم الالي التعلم الآلي

Stochastic gradient descent and fast relaxation to thermodynamic equilibrium: a stochastic control approach

262 - Tobias Breiten , Carsten Hartmann , Lara Neureither 2021

We study the convergence to equilibrium of an underdamped Langevin equation that is controlled by a linear feedback force. Specifically, we are interested in sampling the possibly multimodal invariant probability distribution of a Langevin system at small noise (or low temperature), for which the dynamics can easily get trapped inside metastable subsets of the phase space. We follow [Chen et al., J. Math. Phys. 56, 113302, 2015] and consider a Langevin equation that is simulated at a high temperature, with the control playing the role of a friction that balances the additional noise so as to restore the original invariant measure at a lower temperature. We discuss different limits as the temperature ratio goes to infinity and prove convergence to a limit dynamics. It turns out that, depending on whether the lower (target) or the higher (simulation) temperature is fixed, the controlled dynamics converges either to the overdamped Langevin equation or to a deterministic gradient flow. This implies that (a) the ergodic limit and the large temperature separation limit do not commute in general, and that (b) it is not possible to accelerate the speed of convergence to the ergodic limit by making the temperature separation larger and larger. We discuss the implications of these observation from the perspective of stochastic optimisation algorithms and enhanced sampling schemes in molecular dynamics.

التحسين والتحكم الاحتمالات

On the Convergence Rate of Projected Gradient Descent for a Back-Projection based Objective

64 - Tom Tirer , Raja Giryes 2020

Ill-posed linear inverse problems appear in many scientific setups, and are typically addressed by solving optimization problems, which are composed of data fidelity and prior terms. Recently, several works have considered a back-projection (BP) base d fidelity term as an alternative to the common least squares (LS), and demonstrated excellent results for popular inverse problems. These works have also empirically shown that using the BP term, rather than the LS term, requires fewer iterations of optimization algorithms. In this paper, we examine the convergence rate of the projected gradient descent (PGD) algorithm for the BP objective. Our analysis allows to identify an inherent source for its faster convergence compared to using the LS objective, while making only mild assumptions. We also analyze the more general proximal gradient method under a relaxed contraction condition on the proximal mapping of the prior. This analysis further highlights the advantage of BP when the linear measurement operator is badly conditioned. Numerical experiments with both $ell_1$-norm and GAN-based priors corroborate our theoretical results.

التحسين والتحكم الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد العالي لإدارة الأعمال

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Stochastic Gradient Descent Theorem and the Back-Propagation Algorithm

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

We establish a convergence theorem for a certain type of stochastic gradient descent, which leads to a convergent variant of the back-propagation algorithm

اقرأ أيضاً