TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions

108 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Benjamin Regler

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Benjamin Regler - Matthias Scheffler - Luca M. Ghiringhelli

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The identification of relevant features, i.e., the driving variables that determine a process or the property of a system, is an essential part of the analysis of data sets whose entries are described by a large number of variables. The preferred measure for quantifying the relevance of nonlinear statistical dependencies is mutual information, which requires as input probability distributions. Probability distributions cannot be reliably sampled and estimated from limited data, especially for real-valued data samples such as lengths or energies. Here, we introduce total cumulative mutual information (TCMI), a measure of the relevance of mutual dependencies based on cumulative probability distributions. TCMI can be estimated directly from sample data and is a non-parametric, robust and deterministic measure that facilitates comparisons and rankings between feature sets with different cardinality. The ranking induced by TCMI allows for feature selection, i.e., the identification of the set of relevant features that are statistical related to the process or the property of a system, while taking into account the number of data samples as well as the cardinality of the feature subsets. We evaluate the performance of our measure with simulated data, compare its performance with similar multivariate dependence measures, and demonstrate the effectiveness of our feature selection method on a set of standard data sets and a typical scenario in materials science.

قيم البحث

121 - Janne Leppa-aho , Santeri Raisanen , Xiao Yang 2017

We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.

التعلم الآلي نظرية المعلومات نظرية المعلومات

DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures

99 - Andrew R. Lawrence , Carl Henrik Ek , Neill D. F. Campbell 2018

We present a non-parametric Bayesian latent variable model capable of learning dependency structures across dimensions in a multivariate setting. Our approach is based on flexible Gaussian process priors for the generative mappings and interchangeabl e Dirichlet process priors to learn the structure. The introduction of the Dirichlet process as a specific structural prior allows our model to circumvent issues associated with previous Gaussian process latent variable models. Inference is performed by deriving an efficient variational bound on the marginal log-likelihood on the model.

التعلم الالي التعلم الآلي

Soft and subspace robust multivariate rank tests based on entropy regularized optimal transport

218 - Shoaib Bin Masud , Boyang Lyu , Shuchin Aeron 2021

In this paper, we extend the recently proposed multivariate rank energy distance, based on the theory of optimal transport, for statistical testing of distributional similarity, to soft rank energy distance. Being differentiable, this in turn allows us to extend the rank energy to a subspace robust rank energy distance, dubbed Projected soft-Rank Energy distance, which can be computed via optimization over the Stiefel manifold. We show via experiments that using projected soft rank energy one can trade-off the detection power vs the false alarm via projections onto an appropriately selected low dimensional subspace. We also show the utility of the proposed tests on unsupervised change point detection in multivariate time series data. All codes are publicly available at the link provided in the experiment section.

التعلم الالي نظرية المعلومات التعلم الآلي

Multivariate Log-Concave Distributions as a Nearly Parametric Model

473 - Dominic Schuhmacher , Andre Huesler , Lutz Duembgen 2009

In this paper we show that the family P_d of probability distributions on R^d with log-concave densities satisfies a strong continuity condition. In particular, it turns out that weak convergence within this family entails (i) convergence in total va riation distance, (ii) convergence of arbitrary moments, and (iii) pointwise convergence of Laplace transforms. Hence the nonparametric model P_d has similar properties as parametric models such as, for instance, the family of all d-variate Gaussian distributions.

الاحتمالات نظرية الإحصاء نظرية الإحصاء

Kafnets: kernel-based non-parametric activation functions for neural networks

150 - Simone Scardapane , Steven Van Vaerenbergh , Simone Totaro 2017

Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpensive kernel expansion at every neuron. Leveraging over several properties of kernel-based models, we propose multiple variations for designing and initializing these kernel activation functions (KAFs), including a multidimensional scheme allowing to nonlinearly combine information from different paths in the network. The resulting KAFs can approximate any mapping defined over a subset of the real line, either convex or nonconvex. Furthermore, they are smooth over their entire domain, linear in their parameters, and they can be regularized using any known scheme, including the use of $ell_1$ penalties to enforce sparseness. To the best of our knowledge, no other known model satisfies all these properties simultaneously. In addition, we provide a relatively complete overview on alternative techniques for adapting the activation functions, which is currently lacking in the literature. A large set of experiments validates our proposal.

التعلم الالي الذكاء الاصطناعي التعلم الآلي