ترغب بنشر مسار تعليمي؟ اضغط هنا

Group Inference in High Dimensions with Applications to Hierarchical Testing

284   0   0.0 ( 0 )
 نشر من قبل Claude Renaux
 تاريخ النشر 2019
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

High-dimensional group inference is an essential part of statistical methods for analysing complex data sets, including hierarchical testing, tests of interaction, detection of heterogeneous treatment effects and inference for local heritability. Group inference in regression models can be measured with respect to a weighted quadratic functional of the regression sub-vector corresponding to the group. Asymptotically unbiased estimators of these weighted quadratic functionals are constructed and a novel procedure using these estimators for inference is proposed. We derive its asymptotic Gaussian distribution which enables the construction of asymptotically valid confidence intervals and tests which perform well in terms of length or power. The proposed test is computationally efficient even for a large group, statistically valid for any group size and achieving good power performance for testing large groups with many small regression coefficients. We apply the methodology to several interesting statistical problems and demonstrate its strength and usefulness on simulated and real data.

قيم البحث

اقرأ أيضاً

Hierarchical inference in (generalized) regression problems is powerful for finding significant groups or even single covariates, especially in high-dimensional settings where identifiability of the entire regression parameter vector may be ill-posed . The general method proceeds in a fully data-driven and adaptive way from large to small groups or singletons of covariates, depending on the signal strength and the correlation structure of the design matrix. We propose a novel hierarchical multiple testing adjustment that can be used in combination with any significance test for a group of covariates to perform hierarchical inference. Our adjustment passes on the significance level of certain hypotheses that could not be rejected and is shown to guarantee strong control of the familywise error rate. Our method is at least as powerful as a so-called depth-wise hierarchical Bonferroni adjustment. It provides a substantial gain in power over other previously proposed inheritance hierarchical procedures if the underlying alternative hypotheses occur sparsely along a few branches in the tree-structured hierarchy.
We propose a new method, semi-penalized inference with direct false discovery rate control (SPIDR), for variable selection and confidence interval construction in high-dimensional linear regression. SPIDR first uses a semi-penalized approach to const ructing estimators of the regression coefficients. We show that the SPIDR estimator is ideal in the sense that it equals an ideal least squares estimator with high probability under a sparsity and other suitable conditions. Consequently, the SPIDR estimator is asymptotically normal. Based on this distributional result, SPIDR determines the selection rule by directly controlling false discovery rate. This provides an explicit assessment of the selection error. This also naturally leads to confidence intervals for the selected coefficients with a proper confidence statement. We conduct simulation studies to evaluate its finite sample performance and demonstrate its application on a breast cancer gene expression data set. Our simulation studies and data example suggest that SPIDR is a useful method for high-dimensional statistical inference in practice.
115 - Shulei Wang , Ming Yuan 2016
Motivated by gene set enrichment analysis, we investigate the problem of combined hypothesis testing on a graph. We introduce a general framework to effectively use the structural information of the underlying graph when testing multivariate means. A new testing procedure is proposed within this framework. We show that the test is optimal in that it can consistently detect departure from the collective null at a rate that no other test could improve, for almost all graphs. We also provide general performance bounds for the proposed test under any specific graph, and illustrate their utility through several common types of graphs. Numerical experiments are presented to further demonstrate the merits of our approach.
Epidemiological forecasts are beset by uncertainties about the underlying epidemiological processes, and the surveillance process through which data are acquired. We present a Bayesian inference methodology that quantifies these uncertainties, for ep idemics that are modelled by (possibly) non-stationary, continuous-time, Markov population processes. The efficiency of the method derives from a functional central limit theorem approximation of the likelihood, valid for large populations. We demonstrate the methodology by analysing the early stages of the COVID-19 pandemic in the UK, based on age-structured data for the number of deaths. This includes maximum a posteriori estimates, MCMC sampling of the posterior, computation of the model evidence, and the determination of parameter sensitivities via the Fisher information matrix. Our methodology is implemented in PyRoss, an open-source platform for analysis of epidemiological compartment models.
In the context of a pandemic like COVID-19, and until most people are vaccinated, proactive testing and interventions have been proved to be the only means to contain the disease spread. Recent academic work has offered significant evidence in this r egard, but a critical question is still open: Can we accurately identify all new infections that happen every day, without this being forbiddingly expensive, i.e., using only a fraction of the tests needed to test everyone everyday (complete testing)? Group testing offers a powerful toolset for minimizing the number of tests, but it does not account for the time dynamics behind the infections. Moreover, it typically assumes that people are infected independently, while infections are governed by community spread. Epidemiology, on the other hand, does explore time dynamics and community correlations through the well-established continuous-time SIR stochastic network model, but the standard model does not incorporate discrete-time testing and interventions. In this paper, we introduce a discrete-time SIR stochastic block model that also allows for group testing and interventions on a daily basis. Our model can be regarded as a discrete version of the continuous-time SIR stochastic network model over a specific type of weighted graph that captures the underlying community structure. We analyze that model w.r.t. the minimum number of group tests needed everyday to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا