New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The $s$-value: evaluating stability with respect to distributional shifts

205 0 0.0 ( 0 )

Download Cite

Added by Suyash Gupta

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Suyash Gupta - Dominik Rothenhausler

Methodology Statistics Theory Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Common statistical measures of uncertainty such as $p$-values and confidence intervals quantify the uncertainty due to sampling, that is, the uncertainty due to not observing the full population. However, sampling is not the only source of uncertainty. In practice, distributions change between locations and across time. This makes it difficult to gather knowledge that transfers across data sets. We propose a measure of uncertainty or instability that quantifies the distributional instability of a statistical parameter with respect to Kullback-Leibler divergence, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Leibler divergence ball. In addition, we propose measures to elucidate the instability of parameters with respect to directional or variable-specific shifts. Measuring instability with respect to directional shifts can be used to detect the type of shifts a parameter is sensitive to. We discuss how such knowledge can inform data collection for improved estimation of statistical parameters under shifted distributions. We evaluate the performance of the proposed measure on real data and show that it can elucidate the distributional (in-)stability of a parameter with respect to certain shifts and can be used to improve the accuracy of estimation under shifted distributions.

rate research

Statistical Testing under Distributional Shifts

71 - Nikolaj Thams , Sorawit Saengkyongam , Niklas Pfister 2021

In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis $P^* in H_0$ for a target distribution $P^*$, but observe data from a different distribution $Q^*$. We assume that $P^*$ is related to $Q^*$ through a known shift $tau$ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set and then applies an existing test in the target domain. We prove that if the size of the resample is at most $o(sqrt{n})$ and the resampling weights are well-behaved, this procedure inherits the pointwise asymptotic level and power from the target test. If the map $tau$ is estimated from data, we can maintain the above guarantees under mild conditions if the estimation works sufficiently well. We further extend our results to uniform asymptotic level and a different resampling scheme. Testing under distributional shifts allows us to tackle a diverse set of problems. We argue that it may prove useful in reinforcement learning and covariate shift, we show how it reduces conditional to unconditional independence testing and we provide example applications in causal inference.

Methodology Statistics Theory Statistics Theory

A Note on Taylors Expansion and Mean Value Theorem With Respect to a Random Variable

66 - Yifan Yang , Xiaoyu Zhou 2021

We introduce a stochastic version of Taylors expansion and Mean Value Theorem, originally proved by Aliprantis and Border (1999), and extend them to a multivariate case. For a univariate case, the theorem asserts that suppose a real-valued function $f$ has a continuous derivative $f$ on a closed interval $I$ and $X$ is a random variable on a probability space $(Omega, mathcal{F}, P)$. Fix $a in I$, there exists a textit{random variable} $xi$ such that $xi(omega) in I$ for every $omega in Omega$ and $f(X(omega)) = f(a) + f(xi(omega))(X(omega) - a)$. The proof is not trivial. By applying these results in statistics, one may simplify some details in the proofs of the Delta method or the asymptotic properties for a maximum likelihood estimator. In particular, when mentioning there exists $theta ^ *$ between $hat{theta}$ (a maximum likelihood estimator) and $theta_0$ (the true value), a stochastic version of Mean Value Theorem guarantees $theta ^ *$ is a random variable (or a random vector).

Other Statistics Statistics Theory Statistics Theory

Distributional Consistency of Lasso by Perturbation Bootstrap

157 - Debraj Das , S. N. Lahiri 2017

Least Absolute Shrinkage and Selection Operator or the Lasso, introduced by Tibshirani (1996), is a popular estimation procedure in multiple linear regression when underlying design has a sparse structure, because of its property that it sets some regression coefficients exactly equal to 0. In this article, we develop a perturbation bootstrap method and establish its validity in approximating the distribution of the Lasso in heteroscedastic linear regression. We allow the underlying covariates to be either random or non-random. We show that the proposed bootstrap method works irrespective of the nature of the covariates, unlike the resample-based bootstrap of Freedman (1981) which must be tailored based on the nature (random vs non-random) of the covariates. Simulation study also justifies our method in finite samples.

Methodology Statistics Theory Statistics Theory

Robust high dimensional factor models with applications to statistical machine learning

63 - Jianqing Fan , Kaizheng Wang , Yiqiao Zhong 2018

Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance. As data are collected at an ever-growing scale, statistical machine learning faces some new challenges: high dimensionality, strong dependence among observed variables, heavy-tailed variables and heterogeneity. High-dimensional robust factor analysis serves as a powerful toolkit to conquer these challenges. This paper gives a selective overview on recent advance on high-dimensional factor models and their applications to statistics including Factor-Adjusted Robust Model selection (FarmSelect) and Factor-Adjusted Robust Multiple testing (FarmTest). We show that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference. We highlight PCA and its connections to matrix perturbation theory, robust statistics, random projection, false discovery rate, etc., and illustrate through several applications how insights from these fields yield solutions to modern challenges. We also present far-reaching connections between factor models and popular statistical learning problems, including network analysis and low-rank matrix recovery.

Methodology Statistics Theory Machine Learning

Guaranteed Functional Tensor Singular Value Decomposition

152 - Rungang Han , Pixu Shi , Anru R. Zhang 2021

This paper introduces the functional tensor singular value decomposition (FTSVD), a novel dimension reduction framework for tensors with one functional mode and several tabular modes. The problem is motivated by high-order longitudinal data analysis. Our model assumes the observed data to be a random realization of an approximate CP low-rank functional tensor measured on a discrete time grid. Incorporating tensor algebra and the theory of Reproducing Kernel Hilbert Space (RKHS), we propose a novel RKHS-based constrained power iteration with spectral initialization. Our method can successfully estimate both singular vectors and functions of the low-rank structure in the observed data. With mild assumptions, we establish the non-asymptotic contractive error bounds for the proposed algorithm. The superiority of the proposed framework is demonstrated via extensive experiments on both simulated and real data.

Methodology Statistics Theory Applications

comments

Fetching comments

Private Arab University of Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The $s$-value: evaluating stability with respect to distributional shifts

Ask ChatGPT about the research

No Arabic abstract

Read More