ﻻ يوجد ملخص باللغة العربية
A common problem in machine learning is determining if a variable significantly contributes to a models prediction performance. This problem is aggravated for datasets, such as gene expression datasets, that suffer the worst case of dimensionality: a low number of observations along with a high number of possible explanatory variables. In such scenarios, traditional methods for testing variable statistical significance or constructing variable confidence intervals do not apply. To address these problems, we developed a novel permutation framework for testing the significance of variables in supervised models. Our permutation framework has three main advantages. First, it is non-parametric and does not rely on distributional assumptions or asymptotic results. Second, it not only ranks model variables in terms of relative importance, but also tests for statistical significance of each variable. Third, it can test for the significance of the interaction between model variables. We applied this permutation framework to multi-class classification of the Iris flower dataset and of brain regions in RNA expression data, and using this framework showed variable-level statistical significance and interactions.
In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and par
In the era of big data, variable selection is a key technology for handling high-dimensional problems with a small sample size but a large number of covariables. Different variable selection methods were proposed for different models, such as linear
The direct detection of gravitational waves (GW) from merging binary black holes and neutron stars mark the beginning of a new era in gravitational physics, and it brings forth new opportunities to test theories of gravity. To this end, it is crucial
Recently, a so-called E-MS algorithm was developed for model selection in the presence of missing data. Specifically, it performs the Expectation step (E step) and Model Selection step (MS step) alternately to find the minimum point of the observed g
We propose a general new method, the conditional permutation test, for testing the conditional independence of variables $X$ and $Y$ given a potentially high-dimensional random vector $Z$ that may contain confounding factors. The proposed test permut