ترغب بنشر مسار تعليمي؟ اضغط هنا

A Spectral Approach to Gradient Estimation for Implicit Distributions

105   0   0.0 ( 0 )
 نشر من قبل Jiaxin Shi
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

Recently there have been increasing interests in learning and inference with implicit distributions (i.e., distributions without tractable densities). To this end, we develop a gradient estimator for implicit distributions based on Steins identity and a spectral decomposition of kernel operators, where the eigenfunctions are approximated by the Nystrom method. Unlike the previous works that only provide estimates at the sample points, our approach directly estimates the gradient function, thus allows for a simple and principled out-of-sample extension. We provide theoretical results on the error bound of the estimator and discuss the bias-variance tradeoff in practice. The effectiveness of our method is demonstrated by applications to gradient-free Hamiltonian Monte Carlo and variational inference with implicit distributions. Finally, we discuss the intuition behind the estimator by drawing connections between the Nystrom method and kernel PCA, which indicates that the estimator can automatically adapt to the geometry of the underlying distribution.



قيم البحث

اقرأ أيضاً

We introduce a fully stochastic gradient based approach to Bayesian optimal experimental design (BOED). Our approach utilizes variational lower bounds on the expected information gain (EIG) of an experiment that can be simultaneously optimized with r espect to both the variational and design parameters. This allows the design process to be carried out through a single unified stochastic gradient ascent procedure, in contrast to existing approaches that typically construct a pointwise EIG estimator, before passing this estimator to a separate optimizer. We provide a number of different variational objectives including the novel adaptive contrastive estimation (ACE) bound. Finally, we show that our gradient-based approaches are able to provide effective design optimization in substantially higher dimensional settings than existing approaches.
194 - Alnur Ali , Edgar Dobriban , 2020
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $lambda = 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence).
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result al so generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.
In this paper, we propose a density estimation algorithm called textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervise d tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density function lies in the space $C^{0,alpha}$. Then when the target density function lies in spaces $C^{1,alpha}$, we present an upper bound for GBHT which is smaller than the lower bound of its corresponding base learner, in the sense of convergence rates. To the best of our knowledge, we make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems. In experiments, we not only conduct performance comparisons with the widely used KDE, but also apply GBHT to anomaly detection to showcase a further application of GBHT.
This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharpl y characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا