Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Spectral Approach to Gradient Estimation for Implicit Distributions

105 0 0.0 ( 0 )

Download Cite

Added by Jiaxin Shi

Publication date 2018

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Jiaxin Shi - Shengyang Sun - Jun Zhu

Machine Learning Machine Learning Neural and Evolutionary Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recently there have been increasing interests in learning and inference with implicit distributions (i.e., distributions without tractable densities). To this end, we develop a gradient estimator for implicit distributions based on Steins identity and a spectral decomposition of kernel operators, where the eigenfunctions are approximated by the Nystrom method. Unlike the previous works that only provide estimates at the sample points, our approach directly estimates the gradient function, thus allows for a simple and principled out-of-sample extension. We provide theoretical results on the error bound of the estimator and discuss the bias-variance tradeoff in practice. The effectiveness of our method is demonstrated by applications to gradient-free Hamiltonian Monte Carlo and variational inference with implicit distributions. Finally, we discuss the intuition behind the estimator by drawing connections between the Nystrom method and kernel PCA, which indicates that the estimator can automatically adapt to the geometry of the underlying distribution.

rate research

A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments

204 - Adam Foster , Martin Jankowiak , Matthew OMeara 2019

We introduce a fully stochastic gradient based approach to Bayesian optimal experimental design (BOED). Our approach utilizes variational lower bounds on the expected information gain (EIG) of an experiment that can be simultaneously optimized with respect to both the variational and design parameters. This allows the design process to be carried out through a single unified stochastic gradient ascent procedure, in contrast to existing approaches that typically construct a pointwise EIG estimator, before passing this estimator to a separate optimizer. We provide a number of different variational objectives including the novel adaptive contrastive estimation (ACE) bound. Finally, we show that our gradient-based approaches are able to provide effective design optimization in substantially higher dimensional settings than existing approaches.

Machine Learning Machine Learning Computation

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

194 - Alnur Ali , Edgar Dobriban , 2020

We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).

Machine Learning Machine Learning Optimization and Control

The Implicit Bias of Gradient Descent on Separable Data

247 - Daniel Soudry , Elad Hoffer , Mor Shpigel Nacson 2017

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.

Machine Learning Machine Learning

GBHT: Gradient Boosting Histogram Transform for Density Estimation

97 - Jingyi Cui , Hanyuan Hang , Yisen Wang 2021

In this paper, we propose a density estimation algorithm called textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervised tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density function lies in the space $C^{0,alpha}$. Then when the target density function lies in spaces $C^{1,alpha}$, we present an upper bound for GBHT which is smaller than the lower bound of its corresponding base learner, in the sense of convergence rates. To the best of our knowledge, we make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems. In experiments, we not only conduct performance comparisons with the widely used KDE, but also apply GBHT to anomaly detection to showcase a further application of GBHT.

Machine Learning Machine Learning

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

119 - Prateek Jain , Sham M. Kakade , Rahul Kidambi 2017

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.

Machine Learning Machine Learning Optimization and Control

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Spectral Approach to Gradient Estimation for Implicit Distributions

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions