ترغب بنشر مسار تعليمي؟ اضغط هنا

A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model

208   0   0.0 ( 0 )
 نشر من قبل Dan M. Kluger
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The Benjamini-Hochberg (BH) procedure remains widely popular despite having limited theoretical guarantees in the commonly encountered scenario of correlated test statistics. Of particular concern is the possibility that the method could exhibit bursty behavior, meaning that it might typically yield no false discoveries while occasionally yielding both a large number of false discoveries and a false discovery proportion (FDP) that far exceeds its own well controlled mean. In this paper, we investigate which test statistic correlation structures lead to bursty behavior and which ones lead to well controlled FDPs. To this end, we develop a central limit theorem for the FDP in a multiple testing setup where the test statistic correlations can be either short-range or long-range as well as either weak or strong. The theorem and our simulations from a data-driven factor model suggest that the BH procedure exhibits severe burstiness when the test statistics have many strong, long-range correlations, but does not otherwise.



قيم البحث

اقرأ أيضاً

195 - Shige Peng 2008
We describe a new framework of a sublinear expectation space and the related notions and results of distributions, independence. A new notion of G-distributions is introduced which generalizes our G-normal-distribution in the sense that mean-uncertai nty can be also described. W present our new result of central limit theorem under sublinear expectation. This theorem can be also regarded as a generalization of the law of large number in the case of mean-uncertainty.
The knockoff-based multiple testing setup of Barber & Candes (2015) for variable selection in multiple regression where sample size is as large as the number of explanatory variables is considered. The method of Benjamini & Hochberg (1995) based on o rdinary least squares estimates of the regression coefficients is adjusted to the setup, transforming it to a valid p-value based false discovery rate controlling method not relying on any specific correlation structure of the explanatory variables. Simulations and real data applications show that our proposed method that is agnostic to {pi}0, the proportion of unimportant explanatory variables, and a data-adaptive version of it that uses an estimate of {pi}0 are powerful competitors of the false discovery rate controlling method in Barber & Candes (2015).
We establish a central limit theorem for (a sequence of) multivariate martingales which dimension potentially grows with the length $n$ of the martingale. A consequence of the results are Gaussian couplings and a multiplier bootstrap for the maximum of a multivariate martingale whose dimensionality $d$ can be as large as $e^{n^c}$ for some $c>0$. We also develop new anti-concentration bounds for the maximum component of a high-dimensional Gaussian vector, which we believe is of independent interest. The results are applicable to a variety of settings. We fully develop its use to the estimation of context tree models (or variable length Markov chains) for discrete stationary time series. Specifically, we provide a bootstrap-based rule to tune several regularization parameters in a theoretically valid Lepski-type method. Such bootstrap-based approach accounts for the correlation structure and leads to potentially smaller penalty choices, which in turn improve the estimation of the transition probabilities.
Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which is the expe cted proportion of type I error, is an important challenge for making meaningful inferences. In this paper, we consider the problem of controlling FDR in an online manner. Concretely, we consider an ordered, possibly infinite, sequence of hypotheses, arriving one at each timestep, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis. The decision whether or not to reject the current hypothesis must be made immediately at each timestep, before the next hypothesis is observed. The model of multi-dimensional feature set provides a very general way of leveraging the auxiliary information in the data which helps in maximizing the number of discoveries. We propose a new class of powerful online testing procedures, where the rejections thresholds (significance levels) are learnt sequentially by incorporating contextual information and previous results. We prove that any rule in this class controls online FDR under some standard assumptions. We then focus on a subclass of these procedures, based on weighting significance levels, to derive a practical algorithm that learns a parametric weight function in an online fashion to gain more discoveries. We also theoretically prove, in a stylized setting, that our proposed procedures would lead to an increase in the achieved statistical power over a popular online testing procedure proposed by Javanmard & Montanari (2018). Finally, we demonstrate the favorable performance of our procedure, by comparing it to state-of-the-art online multiple testing procedures, on both synthetic data and real data generated from different applications.
Differential privacy provides a rigorous framework for privacy-preserving data analysis. This paper proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Ben jamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the $p$-values to ensure differential privacy and to select an approximately smallest $p$-value serving as a promising candidate at each iteration; the selected $p$-values are further supplied to the BHq and our private procedure releases only the rejected ones. Moreover, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step-up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا