ترغب بنشر مسار تعليمي؟ اضغط هنا

Identifying important predictors in large data bases -- multiple testing and model selection

55   0   0.0 ( 0 )
 نشر من قبل Florian Frommlet
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

This is a chapter of the forthcoming Handbook of Multiple Testing. We consider a variety of model selection strategies in a high-dimensional setting, where the number of potential predictors p is large compared to the number of available observations n. In particular modifications of information criteria which are suitable in case of p > n are introduced and compared with a variety of penalized likelihood methods, in particular SLOPE and SLOBE. The focus is on methods which control the FDR in terms of model identification. Theoretical results are provided both with respect to model identification and prediction and various simulation results are presented which illustrate the performance of the different methods in different situations.



قيم البحث

اقرأ أيضاً

In this paper, we derive optimal designs for the Rasch Poisson counts model and the Rasch Poisson-Gamma counts model incorporating several binary predictors for the difficulty parameter. To efficiently estimate the regression coefficients of the pred ictors, locally D-optimal designs are developed. After an introduction to the Rasch Poisson counts model and the Rasch Poisson-Gamma counts model we will specify these models as a particular generalized linear mixed model. Based on this embedding optimal designs for both models including several binary explanatory variables will be presented. Therefore, we will derive conditions on the effect sizes of certain designs to be locally D-optimal. Finally, it is pointed out that the results derived for the Rasch Poisson models can be applied for more general Poisson regression models which should receive more attention in future psychological research.
Standardization has been a widely adopted practice in multiple testing, for it takes into account the variability in sampling and makes the test statistics comparable across different study units. However, despite conventional wisdom to the contrary, we show that there can be a significant loss in information from basing hypothesis tests on standardized statistics rather than the full data. We develop a new class of heteroscedasticity--adjusted ranking and thresholding (HART) rules that aim to improve existing methods by simultaneously exploiting commonalities and adjusting heterogeneities among the study units. The main idea of HART is to bypass standardization by directly incorporating both the summary statistic and its variance into the testing procedure. A key message is that the variance structure of the alternative distribution, which is subsumed under standardized statistics, is highly informative and can be exploited to achieve higher power. The proposed HART procedure is shown to be asymptotically valid and optimal for false discovery rate (FDR) control. Our simulation results demonstrate that HART achieves substantial power gain over existing methods at the same FDR level. We illustrate the implementation through a microarray analysis of myeloma.
In meta-analyses, publication bias is a well-known, important and challenging issue because the validity of the results from a meta-analysis is threatened if the sample of studies retrieved for review is biased. One popular method to deal with public ation bias is the Copas selection model, which provides a flexible sensitivity analysis for correcting the estimates with considerable insight into the data suppression mechanism. However, rigorous testing procedures under the Copas selection model to detect bias are lacking. To fill this gap, we develop a score-based test for detecting publication bias under the Copas selection model. We reveal that the behavior of the standard score test statistic is irregular because the parameters of the Copas selection model disappear under the null hypothesis, leading to an identifiability problem. We propose a novel test statistic and derive its limiting distribution. A bootstrap procedure is provided to obtain the p-value of the test for practical applications. We conduct extensive Monte Carlo simulations to evaluate the performance of the proposed test and apply the method to several existing meta-analyses.
We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the counting knockoffs procedure analyzed in Weinstein et al. (2017). Contrary to procedures that start with a $p$-value for each hypothesis, our method analyzes the entire data set to adaptively estimate an optimal $p$-value transform based on an empirical Bayes model. Despite the extra adaptivity, our method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor. An extension, the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification.
60 - Ray Bai , Malay Ghosh 2018
We revisit the problem of simultaneously testing the means of $n$ independent normal observations under sparsity. We take a Bayesian approach to this problem by introducing a scale-mixture prior known as the normal-beta prime (NBP) prior. We first de rive new concentration properties when the beta prime density is employed for a scale parameter in Bayesian hierarchical models. To detect signals in our data, we then propose a hypothesis test based on thresholding the posterior shrinkage weight under the NBP prior. Taking the loss function to be the expected number of misclassified tests, we show that our test procedure asymptotically attains the optimal Bayes risk when the signal proportion $p$ is known. When $p$ is unknown, we introduce an empirical Bayes variant of our test which also asymptotically attains the Bayes Oracle risk in the entire range of sparsity parameters $p propto n^{-epsilon}, epsilon in (0, 1)$. Finally, we also consider restricted marginal maximum likelihood (REML) and hierarchical Bayes approaches for estimating a key hyperparameter in the NBP prior and examine multiple testing under these frameworks.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا