No Arabic abstract
Two-stage randomized experiments are becoming an increasingly popular experimental design for causal inference when the outcome of one unit may be affected by the treatment assignments of other units in the same cluster. In this paper, we provide a methodological framework for general tools of statistical inference and power analysis for two-stage randomized experiments. Under the randomization-based framework, we propose unbiased point estimators of direct and spillover effects, construct conservative variance estimators, develop hypothesis testing procedures, and derive sample size formulas. We also establish the equivalence relationships between the randomization-based and regression-based methods. We theoretically compare the two-stage randomized design with the completely randomized and cluster randomized designs, which represent two limiting designs. Finally, we conduct simulation studies to evaluate the empirical performance of our sample size formulas. For empirical illustration, the proposed methodology is applied to the analysis of the data from a field experiment on a job placement assistance program.
Cluster randomized trials (CRTs) are popular in public health and in the social sciences to evaluate a new treatment or policy where the new policy is randomly allocated to clusters of units rather than individual units. CRTs often feature both noncompliance, when individuals within a cluster are not exposed to the intervention, and individuals within a cluster may influence each other through treatment spillovers where those who comply with the new policy may affect the outcomes of those who do not. Here, we study the identification of causal effects in CRTs when both noncompliance and treatment spillovers are present. We prove that the standard analysis of CRT data with noncompliance using instrumental variables does not identify the usual complier average causal effect when treatment spillovers are present. We extend this result and show that no analysis of CRT data can unbiasedly estimate local network causal effects. Finally, we develop bounds for these causal effects under the assumption that the treatment is not harmful compared to the control. We demonstrate these results with an empirical study of a deworming intervention in Kenya.
Cluster randomized controlled trials (cRCTs) are designed to evaluate interventions delivered to groups of individuals. A practical limitation of such designs is that the number of available clusters may be small, resulting in an increased risk of baseline imbalance under simple randomization. Constrained randomization overcomes this issue by restricting the allocation to a subset of randomization schemes where sufficient overall covariate balance across comparison arms is achieved with respect to a pre-specified balance metric. However, several aspects of constrained randomization for the design and analysis of multi-arm cRCTs have not been fully investigated. Motivated by an ongoing multi-arm cRCT, we provide a comprehensive evaluation of the statistical properties of model-based and randomization-based tests under both simple and constrained randomization designs in multi-arm cRCTs, with varying combinations of design and analysis-based covariate adjustment strategies. In particular, as randomization-based tests have not been extensively studied in multi-arm cRCTs, we additionally develop most-powerful permutation tests under the linear mixed model framework for our comparisons. Our results indicate that under constrained randomization, both model-based and randomization-based analyses could gain power while preserving nominal type I error rate, given proper analysis-based adjustment for the baseline covariates. The choice of balance metrics and candidate set size and their implications on the testing of the pairwise and global hypotheses are also discussed. Finally, we caution against the design and analysis of multi-arm cRCTs with an extremely small number of clusters, due to insufficient degrees of freedom and the tendency to obtain an overly restricted randomization space.
We develop new semiparametric methods for estimating treatment effects. We focus on a setting where the outcome distributions may be thick tailed, where treatment effects are small, where sample sizes are large and where assignment is completely random. This setting is of particular interest in recent experimentation in tech companies. We propose using parametric models for the treatment effects, as opposed to parametric models for the full outcome distributions. This leads to semiparametric models for the outcome distributions. We derive the semiparametric efficiency bound for this setting, and propose efficient estimators. In the case with a constant treatment effect one of the proposed estimators has an interesting interpretation as a weighted average of quantile treatment effects, with the weights proportional to (minus) the second derivative of the log of the density of the potential outcomes. Our analysis also results in an extension of Hubers model and trimmed mean to include asymmetry and a simplified condition on linear combinations of order statistics, which may be of independent interest.
This paper examines methods of inference concerning quantile treatment effects (QTEs) in randomized experiments with matched-pairs designs (MPDs). Standard multiplier bootstrap inference fails to capture the negative dependence of observations within each pair and is therefore conservative. Analytical inference involves estimating multiple functional quantities that require several tuning parameters. Instead, this paper proposes two bootstrap methods that can consistently approximate the limit distribution of the original QTE estimator and lessen the burden of tuning parameter choice. Most especially, the inverse propensity score weighted multiplier bootstrap can be implemented without knowledge of pair identities.
The two-stage process of propensity score analysis (PSA) includes a design stage where propensity scores are estimated and implemented to approximate a randomized experiment and an analysis stage where treatment effects are estimated conditional upon the design. This paper considers how uncertainty associated with the design stage impacts estimation of causal effects in the analysis stage. Such design uncertainty can derive from the fact that the propensity score itself is an estimated quantity, but also from other features of the design stage tied to choice of propensity score implementation. This paper offers a procedure for obtaining the posterior distribution of causal effects after marginalizing over a distribution of design-stage outputs, lending a degree of formality to Bayesian methods for PSA (BPSA) that have gained attention in recent literature. Formulation of a probability distribution for the design-stage output depends on how the propensity score is implemented in the design stage, and propagation of uncertainty into causal estimates depends on how the treatment effect is estimated in the analysis stage. We explore these differences within a sample of commonly-used propensity score implementations (quantile stratification, nearest-neighbor matching, caliper matching, inverse probability of treatment weighting, and doubly robust estimation) and investigate in a simulation study the impact of statistician choice in PS model and implementation on the degree of between- and within-design variability in the estimated treatment effect. The methods are then deployed in an investigation of the association between levels of fine particulate air pollution and elevated exposure to emissions from coal-fired power plants.