No Arabic abstract
We study a stylized multiple testing problem where the test statistics are independent and assumed to have the same distribution under their respective null hypotheses. We first show that, in the normal means model where the test statistics are normal Z-scores, the well-known method of (Benjamini and Hochberg, 1995) is optimal in some asymptotic sense. We then show that this is also the case of a recent distribution-free method proposed by Foygel-Barber and Cand`es (2015). The method is distribution-free in the sense that it is agnostic to the null distribution - it only requires that the null distribution be symmetric. We extend these optimality results to other location models with a base distribution having fast-decaying tails.
We study an online multiple testing problem where the hypotheses arrive sequentially in a stream. The test statistics are independent and assumed to have the same distribution under their respective null hypotheses. We investigate two procedures LORD and LOND, proposed by (Javanmard and Montanari, 2015), which are proved to control the FDR in an online manner. In some (static) model, we show that LORD is optimal in some asymptotic sense, in particular as powerful as the (static) Benjamini-Hochberg procedure to first asymptotic order. We also quantify the performance of LOND. Some numerical experiments complement our theory.
In a multiple testing framework, we propose a method that identifies the interval with the highest estimated false discovery rate of P-values and rejects the corresponding null hypotheses. Unlike the Benjamini-Hochberg method, which does the same but over intervals with an endpoint at the origin, the new procedure `scans all intervals. In parallel with citep*{storey2004strong}, we show that this scan procedure provides strong control of asymptotic false discovery rate. In addition, we investigate its asymptotic false non-discovery rate, deriving conditions under which it outperforms the Benjamini-Hochberg procedure. For example, the scan procedure is superior in power-law location models.
This paper considers Bayesian multiple testing under sparsity for polynomial-tailed distributions satisfying a monotone likelihood ratio property. Included in this class of distributions are the Students t, the Pareto, and many other distributions. We prove some general asymptotic optimality results under fixed and random thresholding. As examples of these general results, we establish the Bayesian asymptotic optimality of several multiple testing procedures in the literature for appropriately chosen false discovery rate levels. We also show by simulation that the Benjamini-Hochberg procedure with a false discovery rate level different from the asymptotically optimal one can lead to high Bayes risk.
We study the well-known problem of estimating a sparse $n$-dimensional unknown mean vector $theta = (theta_1, ..., theta_n)$ with entries corrupted by Gaussian white noise. In the Bayesian framework, continuous shrinkage priors which can be expressed as scale-mixture normal densities are popular for obtaining sparse estimates of $theta$. In this article, we introduce a new fully Bayesian scale-mixture prior known as the inverse gamma-gamma (IGG) prior. We prove that the posterior distribution contracts around the true $theta$ at (near) minimax rate under very mild conditions. In the process, we prove that the sufficient conditions for minimax posterior contraction given by Van der Pas et al. (2016) are not necessary for optimal posterior contraction. We further show that the IGG posterior density concentrates at a rate faster than those of the horseshoe or the horseshoe+ in the Kullback-Leibler (K-L) sense. To classify true signals ($theta_i eq 0$), we also propose a hypothesis test based on thresholding the posterior mean. Taking the loss function to be the expected number of misclassified tests, we show that our test procedure asymptotically attains the optimal Bayes risk exactly. We illustrate through simulations and data analysis that the IGG has excellent finite sample performance for both estimation and classification.
This paper investigates the problem of testing independence of two random vectors of general dimensions. For this, we give for the first time a distribution-free consistent test. Our approach combines distance covariance with the center-outward ranks and signs developed in Hallin (2017). In technical terms, the proposed test is consistent and distribution-free in the family of multivariate distributions with nonvanishing (Lebesgue) probability densities. Exploiting the (degenerate) U-statistic structure of the distance covariance and the combinatorial nature of Hallins center-outward ranks and signs, we are able to derive the limiting null distribution of our test statistic. The resulting asymptotic approximation is accurate already for moderate sample sizes and makes the test implementable without requiring permutation. The limiting distribution is derived via a more general result that gives a new type of combinatorial non-central limit theorem for double- and multiple-indexed permutation statistics.