No Arabic abstract
Motivated by an open problem of validating protein identities in label-free shotgun proteomics work-flows, we present a testing procedure to validate class/protein labels using available measurements across instances/peptides. More generally, we present a solution to the problem of identifying instances that are deemed, based on some distance (or quasi-distance) measure, as outliers relative to the subset of instances assigned to the same class. The proposed procedure is non-parametric and requires no specific distributional assumption on the measured distances. The only assumption underlying the testing procedure is that measured distances between instances within the same class are stochastically smaller than measured distances between instances from different classes. The test is shown to simultaneously control the Type I and Type II error probabilities whilst also controlling the overall error probability of the repeated testing invoked in the validation procedure of initial class labeling. The theoretical results are supplemented with results from an extensive numerical study, simulating a typical setup for labeling validation in proteomics work-flow applications. These results illustrate the applicability and viability of our method. Even with up to 25% of instances mislabeled, our testing procedure maintains a high specificity and greatly reduces the proportion of mislabeled instances.
Suppose several two-valued input-output systems are designed by setting the levels of several controllable factors. For this situation, Taguchi method has proposed to assign the controllable factors to the orthogonal array and use ANOVA model for the standardized SN ratio, which is a natural measure for evaluating the performance of each input-output system. Though this procedure is simple and useful in application indeed, the result can be unreliable when the estimated standard errors of the standardized SN ratios are unbalanced. In this paper, we treat the data arising from the full factorial or fractional factorial designs of several controllable factors as the frequencies of high-dimensional contingency tables, and propose a general testing procedure for the main effects or the interaction effects of the controllable factors.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI. The failure of GRF-based methods is due to unrealistic assumptions about the covariance function of the imaging data. The permutation procedure is the most robust SEI tool because it estimates the covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi-) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use our methods to study the association between performance and executive functioning in a working fMRI study. The sPBJ procedure is robust to variance misspecification and maintains nominal FWER in small samples, in contrast to the GRF methods. The sPBJ also has equal or superior power to the PBJ and permutation procedures. We provide an R package https://github.com/simonvandekar/pbj to perform inference using the PBJ and sPBJ procedures
In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis $P^* in H_0$ for a target distribution $P^*$, but observe data from a different distribution $Q^*$. We assume that $P^*$ is related to $Q^*$ through a known shift $tau$ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set and then applies an existing test in the target domain. We prove that if the size of the resample is at most $o(sqrt{n})$ and the resampling weights are well-behaved, this procedure inherits the pointwise asymptotic level and power from the target test. If the map $tau$ is estimated from data, we can maintain the above guarantees under mild conditions if the estimation works sufficiently well. We further extend our results to uniform asymptotic level and a different resampling scheme. Testing under distributional shifts allows us to tackle a diverse set of problems. We argue that it may prove useful in reinforcement learning and covariate shift, we show how it reduces conditional to unconditional independence testing and we provide example applications in causal inference.
In a multiple testing framework, we propose a method that identifies the interval with the highest estimated false discovery rate of P-values and rejects the corresponding null hypotheses. Unlike the Benjamini-Hochberg method, which does the same but over intervals with an endpoint at the origin, the new procedure `scans all intervals. In parallel with citep*{storey2004strong}, we show that this scan procedure provides strong control of asymptotic false discovery rate. In addition, we investigate its asymptotic false non-discovery rate, deriving conditions under which it outperforms the Benjamini-Hochberg procedure. For example, the scan procedure is superior in power-law location models.
There has been strong recent interest in testing interval null hypothesis for improved scientific inference. For example, Lakens et al (2018) and Lakens and Harms (2017) use this approach to study if there is a pre-specified meaningful treatment effect in gerontology and clinical trials, which is different from the more traditional point null hypothesis that tests for any treatment effect. Two popular Bayesian approaches are available for interval null hypothesis testing. One is the standard Bayes factor and the other is the Region of Practical Equivalence (ROPE) procedure championed by Kruschke and others over many years. This paper establishes a formal connection between these two approaches with two benefits. First, it helps to better understand and improve the ROPE procedure. Second, it leads to a simple and effective algorithm for computing Bayes factor in a wide range of problems using draws from posterior distributions generated by standard Bayesian programs such as BUGS, JAGS and Stan. The tedious and error-prone task of coding custom-made software specific for Bayes factor is then avoided.