Do you want to publish a course? Click here

Statistical Testing under Distributional Shifts

72   0   0.0 ( 0 )
 Added by Nikolaj Thams
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis $P^* in H_0$ for a target distribution $P^*$, but observe data from a different distribution $Q^*$. We assume that $P^*$ is related to $Q^*$ through a known shift $tau$ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set and then applies an existing test in the target domain. We prove that if the size of the resample is at most $o(sqrt{n})$ and the resampling weights are well-behaved, this procedure inherits the pointwise asymptotic level and power from the target test. If the map $tau$ is estimated from data, we can maintain the above guarantees under mild conditions if the estimation works sufficiently well. We further extend our results to uniform asymptotic level and a different resampling scheme. Testing under distributional shifts allows us to tackle a diverse set of problems. We argue that it may prove useful in reinforcement learning and covariate shift, we show how it reduces conditional to unconditional independence testing and we provide example applications in causal inference.



rate research

Read More

Common statistical measures of uncertainty such as $p$-values and confidence intervals quantify the uncertainty due to sampling, that is, the uncertainty due to not observing the full population. However, sampling is not the only source of uncertainty. In practice, distributions change between locations and across time. This makes it difficult to gather knowledge that transfers across data sets. We propose a measure of uncertainty or instability that quantifies the distributional instability of a statistical parameter with respect to Kullback-Leibler divergence, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Leibler divergence ball. In addition, we propose measures to elucidate the instability of parameters with respect to directional or variable-specific shifts. Measuring instability with respect to directional shifts can be used to detect the type of shifts a parameter is sensitive to. We discuss how such knowledge can inform data collection for improved estimation of statistical parameters under shifted distributions. We evaluate the performance of the proposed measure on real data and show that it can elucidate the distributional (in-)stability of a parameter with respect to certain shifts and can be used to improve the accuracy of estimation under shifted distributions.
157 - Debraj Das , S. N. Lahiri 2017
Least Absolute Shrinkage and Selection Operator or the Lasso, introduced by Tibshirani (1996), is a popular estimation procedure in multiple linear regression when underlying design has a sparse structure, because of its property that it sets some regression coefficients exactly equal to 0. In this article, we develop a perturbation bootstrap method and establish its validity in approximating the distribution of the Lasso in heteroscedastic linear regression. We allow the underlying covariates to be either random or non-random. We show that the proposed bootstrap method works irrespective of the nature of the covariates, unlike the resample-based bootstrap of Freedman (1981) which must be tailored based on the nature (random vs non-random) of the covariates. Simulation study also justifies our method in finite samples.
141 - Z. Bai , D. Jiang , J. Yao 2012
For a multivariate linear model, Wilks likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilks $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilks test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilks test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data.
We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the counting knockoffs procedure analyzed in Weinstein et al. (2017). Contrary to procedures that start with a $p$-value for each hypothesis, our method analyzes the entire data set to adaptively estimate an optimal $p$-value transform based on an empirical Bayes model. Despite the extra adaptivity, our method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor. An extension, the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification.
Standardization has been a widely adopted practice in multiple testing, for it takes into account the variability in sampling and makes the test statistics comparable across different study units. However, despite conventional wisdom to the contrary, we show that there can be a significant loss in information from basing hypothesis tests on standardized statistics rather than the full data. We develop a new class of heteroscedasticity--adjusted ranking and thresholding (HART) rules that aim to improve existing methods by simultaneously exploiting commonalities and adjusting heterogeneities among the study units. The main idea of HART is to bypass standardization by directly incorporating both the summary statistic and its variance into the testing procedure. A key message is that the variance structure of the alternative distribution, which is subsumed under standardized statistics, is highly informative and can be exploited to achieve higher power. The proposed HART procedure is shown to be asymptotically valid and optimal for false discovery rate (FDR) control. Our simulation results demonstrate that HART achieves substantial power gain over existing methods at the same FDR level. We illustrate the implementation through a microarray analysis of myeloma.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا