Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Holonomic Gradient Descent and its Application to Fisher-Bingham Integral

441 0 0.0 ( 0 )

Download Cite

Added by Nobuki Takayama

Publication date 2010

fields Informatics Engineering

and research's language is English

Authors Tomonari Sei - Nobuki Takayama - Akimichi Takemura

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We give a new algorithm to find local maximum and minimum of a holonomic function and apply it for the Fisher-Bingham integral on the sphere $S^n$, which is used in the directional statistics. The method utilizes the theory and algorithms of holonomic systems.

rate research

Pfaffian Systems of A-Hypergeometric Systems II --- Holonomic Gradient Method

453 - Katsuyoshi Ohara , Nobuki Takayama 2015

We give two efficient methods to derive Pfaffian systems for A-hypergeometric systems for the application to the holonomic gradient method for statistics. We utilize the Hilbert driven Buchberger algorithm and Macaulay type matrices in the two methods.

Symbolic Computation Classical Analysis and ODEs

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

88 - Jie Chen , Ronny Luss 2018

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss functions and training nonconvex deep neural networks. The theory assumes that one can easily compute an unbiased gradient estimator, which is usually the case due to the sample average nature of empirical risk minimization. There exist, however, many scenarios (e.g., graphs) where an unbiased estimator may be as expensive to compute as the full gradient because training examples are interconnected. Recently, Chen et al. (2018) proposed using a consistent gradient estimator as an economic alternative. Encouraged by empirical success, we show, in a general setting, that consistent estimators result in the same convergence behavior as do unbiased ones. Our analysis covers strongly convex, convex, and nonconvex objectives. We verify the results with illustrative experiments on synthetic and real-world data. This work opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.

Machine Learning Optimization and Control Machine Learning

Learning to Initialize Gradient Descent Using Gradient Descent

356 - Kartik Ahuja , Amit Dhurandhar , Kush R. Varshney 2020

Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization rules are carefully designed by exploiting the nature of the problem class. As a simple alternative to hand-crafted initialization rules, we propose an approach for learning good initialization rules from previous solutions. We provide theoretical guarantees that establish conditions that are sufficient in all cases and also necessary in some under which our approach performs better than random initialization. We apply our methodology to various non-convex problems such as generating adversarial examples, generating post hoc explanations for black-box machine learning models, and allocating communication spectrum, and show consistent gains over other initialization techniques.

Machine Learning Machine Learning

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

117 - Yunwen Lei , Ting Hu , Guiying Li 2019

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this paper, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex objective functions and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

Machine Learning Optimization and Control Machine Learning

Decentralised Learning with Random Features and Distributed Gradient Descent

116 - Dominic Richards , Patrick Rebeschini , Lorenzo Rosasco 2020

We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in the homogenous setting where a network of agents are given data sampled independently from the same unknown distribution. Along with reducing the memory footprint, Random Features are particularly convenient in this setting as they provide a common parameterisation across agents that allows to overcome previous difficulties in implementing Decentralised Kernel Regression. Under standard source and capacity assumptions, we establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of Random Features. By tuning these parameters, we obtain statistical rates that are minimax optimal with respect to the total number of samples in the network. The algorithm provides a linear improvement over single machine Gradient Descent in memory cost and, when agents hold enough data with respect to the network size and inverse spectral gap, a linear speed-up in computational runtime for any network topology. We present simulations that show how the number of Random Features, iterations and samples impact predictive performance.

Machine Learning Machine Learning Statistics Theory

comments

Fetching comments

University of Aleppo

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Holonomic Gradient Descent and its Application to Fisher-Bingham Integral

Ask ChatGPT about the research

No Arabic abstract

We give a new algorithm to find local maximum and minimum of a holonomic function and apply it for the Fisher-Bingham integral on the sphere $S^n$, which is used in the directional statistics. The method utilizes the theory and algorithms of holonomic systems.

Read More