No Arabic abstract
This paper derives time-uniform confidence sequences (CS) for causal effects in experimental and observational settings. A confidence sequence for a target parameter $psi$ is a sequence of confidence intervals $(C_t)_{t=1}^infty$ such that every one of these intervals simultaneously captures $psi$ with high probability. Such CSs provide valid statistical inference for $psi$ at arbitrary stopping times, unlike classical fixed-time confidence intervals which require the sample size to be fixed in advance. Existing methods for constructing CSs focus on the nonasymptotic regime where certain assumptions (such as known bounds on the random variables) are imposed, while doubly robust estimators of causal effects rely on (asymptotic) semiparametric theory. We use sequenti
Frequentist inference has a well-established supporting theory for doubly robust causal inference based on the potential outcomes framework, which is realized via outcome regression (OR) and propensity score (PS) models. The Bayesian counterpart, however, is not obvious as the PS model loses its balancing property in joint modeling. In this paper, we propose a natural and formal Bayesian solution by bridging loss-type Bayesian inference with a utility function derived from the notion of a pseudo-population via the change of measure. Consistency of the posterior distribution is shown with correctly specified and misspecified OR models. Simulation studies suggest that our proposed method can estimate the true causal effect more efficiently and achieve the frequentist coverage if either the OR model is correctly specified or fit with a flexible function of the confounders, compared to the previous Bayesian approach via the Bayesian bootstrap. Finally, we apply this novel Bayesian method to assess the impact of speed cameras on the reduction of car collisions in England.
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d_n$ and $n$ both increase to infinity together at some prescribed relative rate. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n gg d$, or $d_n/n approx 0.2$? This paper considers the goal of dimension-agnostic inference -- developing methods whose validity does not depend on any assumption on $d_n$. We introduce a new, generic approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonals. We exemplify our technique for a handful of classical problems including one-sample mean and covariance testing. Our tests are shown to have minimax rate-optimal power against appropriate local alternatives, and without explicitly targeting the high-dimensional setting their power is optimal up to a $sqrt 2$ factor. A hidden advantage is that our proofs are simple and transparent. We end by describing several fruitful open directions.
Important advances have recently been achieved in developing procedures yielding uniformly valid inference for a low dimensional causal parameter when high-dimensional nuisance models must be estimated. In this paper, we review the literature on uniformly valid causal inference and discuss the costs and benefits of using uniformly valid inference procedures. Naive estimation strategies based on regularisation, machine learning, or a preliminary model selection stage for the nuisance models have finite sample distributions which are badly approximated by their asymptotic distributions. To solve this serious problem, estimators which converge uniformly in distribution over a class of data generating mechanisms have been proposed in the literature. In order to obtain uniformly valid results in high-dimensional situations, sparsity conditions for the nuisance models need typically to be made, although a double robustness property holds, whereby if one of the nuisance model is more sparse, the other nuisance model is allowed to be less sparse. While uniformly valid inference is a highly desirable property, uniformly valid procedures pay a high price in terms of inflated variability. Our discussion of this dilemma is illustrated by the study of a double-selection outcome regression estimator, which we show is uniformly asymptotically unbiased, but is less variable than uniformly valid estimators in the numerical experiments conducted.
Confidence intervals based on penalized maximum likelihood estimators such as the LASSO, adaptive LASSO, and hard-thresholding are analyzed. In the known-variance case, the finite-sample coverage properties of such intervals are determined and it is shown that symmetric intervals are the shortest. The length of the shortest intervals based on the hard-thresholding estimator is larger than the length of the shortest interval based on the adaptive LASSO, which is larger than the length of the shortest interval based on the LASSO, which in turn is larger than the standard interval based on the maximum likelihood estimator. In the case where the penalized estimators are tuned to possess the `sparsity property, the intervals based on these estimators are larger than the standard interval by an order of magnitude. Furthermore, a simple asymptotic confidence interval construction in the `sparse case, that also applies to the smoothly clipped absolute deviation estimator, is discussed. The results for the known-variance case are shown to carry over to the unknown-variance case in an appropriate asymptotic sense.
We investigate the problem of inferring the causal predictors of a response $Y$ from a set of $d$ explanatory variables $(X^1,dots,X^d)$. Classical ordinary least squares regression includes all predictors that reduce the variance of $Y$. Using only the causal predictors instead leads to models that have the advantage of remaining invariant under interventions, loosely speaking they lead to invariance across different environments or heterogeneity patterns. More precisely, the conditional distribution of $Y$ given its causal predictors remains invariant for all observations. Recent work exploits such a stability to infer causal relations from data with different but known environments. We show that even without having knowledge of the environments or heterogeneity pattern, inferring causal relations is possible for time-ordered (or any other type of sequentially ordered) data. In particular, this allows detecting instantaneous causal relations in multivariate linear time series which is usually not the case for Granger causality. Besides novel methodology, we provide statistical confidence bounds and asymptotic detection results for inferring causal predictors, and present an application to monetary policy in macroeconomics.