ترغب بنشر مسار تعليمي؟ اضغط هنا

A Power Analysis of the Conditional Randomization Test and Knockoffs

100   0   0.0 ( 0 )
 نشر من قبل Wenshuo Wang
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In many scientific problems, researchers try to relate a response variable $Y$ to a set of potential explanatory variables $X = (X_1,dots,X_p)$, and start by trying to identify variables that contribute to this relationship. In statistical terms, this goal can be posed as trying to identify $X_j$s upon which $Y$ is conditionally dependent. Sometimes it is of value to simultaneously test for each $j$, which is more commonly known as variable selection. The conditional randomization test (CRT) and model-X knockoffs are two recently proposed methods that respectively perform conditional independence testing and variable selection by, for each $X_j$, computing any test statistic on the data and assessing that test statistics significance by comparing it to test statistics computed on synthetic variables generated using knowledge of $X$s distribution. Our main contribution is to analyze their power in a high-dimensional linear model where the ratio of the dimension $p$ and the sample size $n$ converge to a positive constant. We give explicit expressions of the asymptotic power of the CRT, variable selection with CRT $p$-values, and model-X knockoffs, each with a test statistic based on either the marginal covariance, the least squares coefficient, or the lasso. One useful application of our analysis is the direct theoretical comparison of the asymptotic powers of variable selection with CRT $p$-values and model-X knockoffs; in the instances with independent covariates that we consider, the CRT provably dominates knockoffs. We also analyze the power gain from using unlabeled data in the CRT when limited knowledge of $X$s distribution is available, and the power of the CRT when samples are collected retrospectively.



قيم البحث

اقرأ أيضاً

Two-sample tests have been one of the most classical topics in statistics with wide application even in cutting edge applications. There are at least two modes of inference used to justify the two-sample tests. One is usual superpopulation inference assuming the units are independent and identically distributed (i.i.d.) samples from some superpopulation; the other is finite population inference that relies on the random assignments of units into different groups. When randomization is actually implemented, the latter has the advantage of avoiding distributional assumptions on the outcomes. In this paper, we will focus on finite population inference for censored outcomes, which has been less explored in the literature. Moreover, we allow the censoring time to depend on treatment assignment, under which exact permutation inference is unachievable. We find that, surprisingly, the usual logrank test can also be justified by randomization. Specifically, under a Bernoulli randomized experiment with non-informative i.i.d. censoring within each treatment arm, the logrank test is asymptotically valid for testing Fishers null hypothesis of no treatment effect on any unit. Moreover, the asymptotic validity of the logrank test does not require any distributional assumption on the potential event times. We further extend the theory to the stratified logrank test, which is useful for randomized blocked designs and when censoring mechanisms vary across strata. In sum, the developed theory for the logrank test from finite population inference supplements its classical theory from usual superpopulation inference, and helps provide a broader justification for the logrank test.
150 - H. Dette , B. Hetzler 2008
In the common nonparametric regression model the problem of testing for a specific parametric form of the variance function is considered. Recently Dette and Hetzler (2008) proposed a test statistic, which is based on an empirical process of pseudo r esiduals. The process converges weakly to a Gaussian process with a complicated covariance kernel depending on the data generating process. In the present paper we consider a standardized version of this process and propose a martingale transform to obtain asymptotically distribution free tests for the corresponding Kolmogorov-Smirnov and Cram{e}r-von-Mises functionals. The finite sample properties of the proposed tests are investigated by means of a simulation study.
129 - Rui Wang , Wangli Xu 2021
This paper is concerned with the problem of comparing the population means of two groups of independent observations. An approximate randomization test procedure based on the test statistic of Chen & Qin (2010) is proposed. The asymptotic behavior of the test statistic as well as the randomized statistic is studied under weak conditions. In our theoretical framework, observations are not assumed to be identically distributed even within groups. No condition on the eigenstructure of the covariance is imposed. And the sample sizes of two groups are allowed to be unbalanced. Under general conditions, all possible asymptotic distributions of the test statistic are obtained. We derive the asymptotic level and local power of the proposed test procedure. Our theoretical results show that the proposed test procedure can adapt to all possible asymptotic distributions of the test statistic and always has correct test level asymptotically. Also, the proposed test procedure has good power behavior. Our numerical experiments show that the proposed test procedure has favorable performance compared with several altervative test procedures.
179 - Dennis Leung , Qi-Man Shao 2017
Let ${bf R}$ be the Pearson correlation matrix of $m$ normal random variables. The Raos score test for the independence hypothesis $H_0 : {bf R} = {bf I}_m$, where ${bf I}_m$ is the identity matrix of dimension $m$, was first considered by Schott (20 05) in the high dimensional setting. In this paper, we study the asymptotic minimax power function of this test, under an asymptotic regime in which both $m$ and the sample size $n$ tend to infinity with the ratio $m/n$ upper bounded by a constant. In particular, our result implies that the Raos score test is rate-optimal for detecting the dependency signal $|{bf R} - {bf I}_m|_F$ of order $sqrt{m/n}$, where $|cdot|_F$ is the matrix Frobenius norm.
131 - Yinqiu He , Zi Wang , 2020
The likelihood ratio test is widely used in exploratory factor analysis to assess the model fit and determine the number of latent factors. Despite its popularity and clear statistical rationale, researchers have found that when the dimension of the response data is large compared to the sample size, the classical chi-square approximation of the likelihood ratio test statistic often fails. Theoretically, it has been an open problem when such a phenomenon happens as the dimension of data increases; practically, the effect of high dimensionality is less examined in exploratory factor analysis, and there lacks a clear statistical guideline on the validity of the conventional chi-square approximation. To address this problem, we investigate the failure of the chi-square approximation of the likelihood ratio test in high-dimensional exploratory factor analysis, and derive the necessary and sufficient condition to ensure the validity of the chi-square approximation. The results yield simple quantitative guidelines to check in practice and would also provide useful statistical insights into the practice of exploratory factor analysis.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا