Generalized Permutation Framework for Testing Model Variable Significance

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yue Wu

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Yue Wu - Ted Spaide - Kenji Nakamichi

المنهجية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A common problem in machine learning is determining if a variable significantly contributes to a models prediction performance. This problem is aggravated for datasets, such as gene expression datasets, that suffer the worst case of dimensionality: a low number of observations along with a high number of possible explanatory variables. In such scenarios, traditional methods for testing variable statistical significance or constructing variable confidence intervals do not apply. To address these problems, we developed a novel permutation framework for testing the significance of variables in supervised models. Our permutation framework has three main advantages. First, it is non-parametric and does not rely on distributional assumptions or asymptotic results. Second, it not only ranks model variables in terms of relative importance, but also tests for statistical significance of each variable. Third, it can test for the significance of the interaction between model variables. We applied this permutation framework to multi-class classification of the Iris flower dataset and of brain regions in RNA expression data, and using this framework showed variable-level statistical significance and interactions.

قيم البحث

353 - Xiaoyu Ma , Lu Lin , Yujie Gai 2021

In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and par ameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or non-differentiable penalty function. The online updating coordinate descent algorithm is proposed to solve the online updating optimization problem. Moreover, a tuning parameter selection is suggested in an online updating way. The selection and estimation consistencies, and the oracle property are established, theoretically. Our methods are further examined and illustrated by various numerical examples from both simulation experiments and a real data analysis.

المنهجية

Bayesian Variable Selection for Single Index Logistic Model

211 - Yinrui Sun , Hangjin Jiang 2020

In the era of big data, variable selection is a key technology for handling high-dimensional problems with a small sample size but a large number of covariables. Different variable selection methods were proposed for different models, such as linear model, logistic model and generalized linear model. However, fewer works focused on variable selection for single index models, especially, for single index logistic model, due to the difficulty arose from the unknown link function and the slow mixing rate of MCMC algorithm for traditional logistic model. In this paper, we proposed a Bayesian variable selection procedure for single index logistic model by taking the advantage of Gaussian process and data augmentation. Numerical results from simulations and real data analysis show the advantage of our method over the state of arts.

المنهجية

Generalized framework for testing gravity with gravitational-wave propagation. I. Formulation

118 - Atsushi Nishizawa 2017

The direct detection of gravitational waves (GW) from merging binary black holes and neutron stars mark the beginning of a new era in gravitational physics, and it brings forth new opportunities to test theories of gravity. To this end, it is crucial to search for anomalous deviations from general relativity in a model-independent way, irrespective of gravity theories, GW sources, and background spacetimes. In this paper, we propose a new universal framework for testing gravity with GW, based on the generalized propagation of a GW in an effective field theory that describes modification of gravity at cosmological scales. Then we perform a parameter estimation study, showing how well the future observation of GW can constrain the model parameters in the generalized models of GW propagation.

النسبية العامة وهدية الكونيات الكم علم الكونيات والفيزياء الفلكية Nongalactic

A generalized EMS algorithm for model selection with incomplete data

327 - Ping-Feng Xu , Lai-Xu Shang , Man-Lai Tang 2021

Recently, a so-called E-MS algorithm was developed for model selection in the presence of missing data. Specifically, it performs the Expectation step (E step) and Model Selection step (MS step) alternately to find the minimum point of the observed g eneralized information criteria (GIC). In practice, it could be numerically infeasible to perform the MS-step for high dimensional settings. In this paper, we propose a more simple and feasible generalized EMS (GEMS) algorithm which simply requires a decrease in the observed GIC in the MS-step and includes the original EMS algorithm as a special case. We obtain several numerical convergence results of the GEMS algorithm under mild conditions. We apply the proposed GEMS algorithm to Gaussian graphical model selection and variable selection in generalized linear models and compare it with existing competitors via numerical experiments. We illustrate its application with three real data sets.

المنهجية

The conditional permutation test for independence while controlling for confounders

114 - Thomas B. Berrett , Yi Wang , Rina Foygel Barber 2018

We propose a general new method, the conditional permutation test, for testing the conditional independence of variables $X$ and $Y$ given a potentially high-dimensional random vector $Z$ that may contain confounding factors. The proposed test permut es entries of $X$ non-uniformly, so as to respect the existing dependence between $X$ and $Z$ and thus account for the presence of these confounders. Like the conditional randomization test of Cand`es et al. (2018), our test relies on the availability of an approximation to the distribution of $X mid Z$. While Cand`es et al. (2018)s test uses this estimate to draw new $X$ values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the $X$ values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of $Xmid Z$, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.

المنهجية نظرية الإحصاء نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشام الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Generalized Permutation Framework for Testing Model Variable Significance

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً