Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation

46 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hiroaki Sasaki

تاريخ النشر 2014

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Hiroaki Sasaki - Yung-Kyun Noh - Masashi Sugiyama

التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Estimation of density derivatives is a versatile tool in statistical data analysis. A naive approach is to first estimate the density and then compute its derivative. However, such a two-step approach does not work well because a good density estimator does not necessarily mean a good density-derivative estimator. In this paper, we give a direct method to approximate the density derivative without estimating the density itself. Our proposed estimator allows analytic and computationally efficient approximation of multi-dimensional high-order density derivatives, with the ability that all hyper-parameters can be chosen objectively by cross-validation. We further show that the proposed density-derivative estimator is useful in improving the accuracy of non-parametric KL-divergence estimation via metric learning. The practical superiority of the proposed method is experimentally demonstrated in change detection and feature selection.

قيم البحث

128 - Zhijian Ou , Yunfu Song 2020

Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging. In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inferenc e model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed in a new method based on stochastic approximation (SA) theory of the Robbins-Monro type. Specifically, we propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA). To the best of our knowledge, JSA represents the first method that couples an SA version of the EM (expectation-maximization) algorithm (SAEM) with an adaptive MCMC procedure. Experiments on several benchmark generative modeling and structured prediction tasks show that JSA consistently outperforms recent competitive algorithms, with faster convergence, better final likelihoods, and lower variance of gradient estimates.

التعلم الالي التعلم الآلي

Centroid estimation based on symmetric KL divergence for Multinomial text classification problem

349 - Jiangning Chen , Heinrich Matzinger , Haoyan Zhai 2018

We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the ne w method achieves substantial improvements over the traditional classifiers.

استرجاع المعلومات التعلم الآلي التعلم الالي

The Success of AdaBoost and Its Application in Portfolio Management

62 - Yijian Chuan , Chaoyi Zhao , Zhenrui He 2021

We develop a novel approach to explain why AdaBoost is a successful classifier. By introducing a measure of the influence of the noise points (ION) in the training data for the binary classification problem, we prove that there is a strong connection between the ION and the test error. We further identify that the ION of AdaBoost decreases as the iteration number or the complexity of the base learners increases. We confirm that it is impossible to obtain a consistent classifier without deep trees as the base learners of AdaBoost in some complicated situations. We apply AdaBoost in portfolio management via empirical studies in the Chinese market, which corroborates our theoretical propositions.

التعلم الالي التعلم الآلي إدارة المحافظ

Innovations Autoencoder and its Application in One-class Anomalous Sequence Detection

213 - Xinyi Wang , Lang Tong 2021

An innovations sequence of a time series is a sequence of independent and identically distributed random variables with which the original time series has a causal representation. The innovation at a time is statistically independent of the history o f the time series. As such, it represents the new information contained at present but not in the past. Because of its simple probability structure, an innovations sequence is the most efficient signature of the original. Unlike the principle or independent component analysis representations, an innovations sequence preserves not only the complete statistical properties but also the temporal order of the original time series. An long-standing open problem is to find a computationally tractable way to extract an innovations sequence of non-Gaussian processes. This paper presents a deep learning approach, referred to as Innovations Autoencoder (IAE), that extracts innovations sequences using a causal convolutional neural network. An application of IAE to the one-class anomalous sequence detection problem with unknown anomaly and anomaly-free models is also presented.

التعلم الالي التعلم الآلي

How to use KL-divergence to construct conjugate priors, with well-defined non-informative limits, for the multivariate Gaussian

90 - Niko Brummer 2021

The Wishart distribution is the standard conjugate prior for the precision of the multivariate Gaussian likelihood, when the mean is known -- while the normal-Wishart can be used when the mean is also unknown. It is however not so obvious how to assi gn values to the hyperparameters of these distributions. In particular, when forming non-informative limits of these distributions, the shape (or degrees of freedom) parameter of the Wishart must be handled with care. The intuitive solution of directly interpreting the shape as a pseudocount and letting it go to zero, as proposed by some authors, violates the restrictions on the shape parameter. We show how to use the scaled KL-divergence between multivariate Gaussians as an energy function to construct Wishart and normal-Wishart conjugate priors. When used as informative priors, the salient feature of these distributions is the mode, while the KL scaling factor serves as the pseudocount. The scale factor can be taken down to the limit at zero, to form non-informative priors that do not violate the restrictions on the Wishart shape parameter. This limit is non-informative in the sense that the posterior mode is identical to the maximum likelihood estimate of the parameters of the Gaussian.

التعلم الالي التعلم الآلي نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً