بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Holonomic Gradient Descent and its Application to Fisher-Bingham Integral

465 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nobuki Takayama

تاريخ النشر 2010

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tomonari Sei - Nobuki Takayama - Akimichi Takemura

الحساب الرمزي التحسين والتحكم نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We give a new algorithm to find local maximum and minimum of a holonomic function and apply it for the Fisher-Bingham integral on the sphere $S^n$, which is used in the directional statistics. The method utilizes the theory and algorithms of holonomic systems.

قيم البحث

485 - Katsuyoshi Ohara , Nobuki Takayama 2015

We give two efficient methods to derive Pfaffian systems for A-hypergeometric systems for the application to the holonomic gradient method for statistics. We utilize the Hilbert driven Buchberger algorithm and Macaulay type matrices in the two methods.

الحساب الرمزي التحليل الكلاسيكي و ODEs

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

88 - Jie Chen , Ronny Luss 2018

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss functions a nd training nonconvex deep neural networks. The theory assumes that one can easily compute an unbiased gradient estimator, which is usually the case due to the sample average nature of empirical risk minimization. There exist, however, many scenarios (e.g., graphs) where an unbiased estimator may be as expensive to compute as the full gradient because training examples are interconnected. Recently, Chen et al. (2018) proposed using a consistent gradient estimator as an economic alternative. Encouraged by empirical success, we show, in a general setting, that consistent estimators result in the same convergence behavior as do unbiased ones. Our analysis covers strongly convex, convex, and nonconvex objectives. We verify the results with illustrative experiments on synthetic and real-world data. This work opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.

التعلم الآلي التحسين والتحكم التعلم الالي

Learning to Initialize Gradient Descent Using Gradient Descent

356 - Kartik Ahuja , Amit Dhurandhar , Kush R. Varshney 2020

Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization r ules are carefully designed by exploiting the nature of the problem class. As a simple alternative to hand-crafted initialization rules, we propose an approach for learning good initialization rules from previous solutions. We provide theoretical guarantees that establish conditions that are sufficient in all cases and also necessary in some under which our approach performs better than random initialization. We apply our methodology to various non-convex problems such as generating adversarial examples, generating post hoc explanations for black-box machine learning models, and allocating communication spectrum, and show consistent gains over other initialization techniques.

التعلم الآلي التعلم الالي

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

117 - Yunwen Lei , Ting Hu , Guiying Li 2019

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this paper, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex objective functions and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

التعلم الآلي التحسين والتحكم التعلم الالي

Decentralised Learning with Random Features and Distributed Gradient Descent

116 - Dominic Richards , Patrick Rebeschini , Lorenzo Rosasco 2020

We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution . Along with reducing the memory footprint, Random Features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing Decentralised Kernel Regression. Under standard source and capacity assumptions, we establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of Random Features. By tuning these parameters, we obtain statistical rates that are minimax optimal with respect to the total number of samples in the network. The algorithm provides a linear improvement over single machine Gradient Descent in memory cost and, when agents hold enough data with respect to the network size and inverse spectral gap, a linear speed-up in computational runtime for any network topology. We present simulations that show how the number of Random Features, iterations and samples impact predictive performance.

التعلم الالي التعلم الآلي نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الجزيرة الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Holonomic Gradient Descent and its Application to Fisher-Bingham Integral

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

We give a new algorithm to find local maximum and minimum of a holonomic function and apply it for the Fisher-Bingham integral on the sphere $S^n$, which is used in the directional statistics. The method utilizes the theory and algorithms of holonomic systems.

اقرأ أيضاً