No Arabic abstract
We present current methods for estimating treatment effects and spillover effects under interference, a term which covers a broad class of situations in which a units outcome depends not only on treatments received by that unit, but also on treatments received by other units. To the extent that units react to each other, interact, or otherwise transmit effects of treatments, valid inference requires that we account for such interference, which is a departure from the traditional assumption that units outcomes are affected only by their own treatment assignment. Interference and associated spillovers may be a nuisance or they may be of substantive interest to the researcher. In this chapter, we focus on interference in the context of randomized experiments. We review methods for when interference happens in a general network setting. We then consider the special case where interference is contained within a hierarchical structure. Finally, we discuss the relationship between interference and contagion. We use the interference R package and simulated data to illustrate key points. We consider efficient designs that allow for estimation of the treatment and spillover effects and discuss recent empirical studies that try to capture such effects.
The literature in social network analysis has largely focused on methods and models which require complete network data; however there exist many networks which can only be studied via sampling methods due to the scale or complexity of the network, access limitations, or the population of interest is hard to reach. In such cases, the application of random walk-based Markov chain Monte Carlo (MCMC) methods to estimate multiple network features is common. However, the reliability of these estimates has been largely ignored. We consider and further develop multivariate MCMC output analysis methods in the context of network sampling to directly address the reliability of the multivariate estimation. This approach yields principled, computationally efficient, and broadly applicable methods for assessing the Monte Carlo estimation procedure. In particular, with respect to two random-walk algorithms, a simple random walk and a Metropolis-Hastings random walk, we construct and compare network parameter estimates, effective sample sizes, coverage probabilities, and stopping rules, all of which speaks to the estimation reliability.
In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters in the graph, corresponding to communities in Hollenbeck, and compare these with the LAPDs knowledge of the individuals gang membership. We discuss different ways of encoding the geosocial information using a graph structure and the influence on the resulting clusterings. Finally we analyze the robustness of this technique with respect to noisy and incomplete data, thereby providing suggestions about the relative importance of quantity versus quality of collected data.
Cluster randomized trials (CRTs) are popular in public health and in the social sciences to evaluate a new treatment or policy where the new policy is randomly allocated to clusters of units rather than individual units. CRTs often feature both noncompliance, when individuals within a cluster are not exposed to the intervention, and individuals within a cluster may influence each other through treatment spillovers where those who comply with the new policy may affect the outcomes of those who do not. Here, we study the identification of causal effects in CRTs when both noncompliance and treatment spillovers are present. We prove that the standard analysis of CRT data with noncompliance using instrumental variables does not identify the usual complier average causal effect when treatment spillovers are present. We extend this result and show that no analysis of CRT data can unbiasedly estimate local network causal effects. Finally, we develop bounds for these causal effects under the assumption that the treatment is not harmful compared to the control. We demonstrate these results with an empirical study of a deworming intervention in Kenya.
Objective: Provide guidance on sample size considerations for developing predictive models by empirically establishing the adequate sample size, which balances the competing objectives of improving model performance and reducing model complexity as well as computational requirements. Materials and Methods: We empirically assess the effect of sample size on prediction performance and model complexity by generating learning curves for 81 prediction problems in three large observational health databases, requiring training of 17,248 prediction models. The adequate sample size was defined as the sample size for which the performance of a model equalled the maximum model performance minus a small threshold value. Results: The adequate sample size achieves a median reduction of the number of observations between 9.5% and 78.5% for threshold values between 0.001 and 0.02. The median reduction of the number of predictors in the models at the adequate sample size varied between 8.6% and 68.3%, respectively. Discussion: Based on our results a conservative, yet significant, reduction in sample size and model complexity can be estimated for future prediction work. Though, if a researcher is willing to generate a learning curve a much larger reduction of the model complexity may be possible as suggested by a large outcome-dependent variability. Conclusion: Our results suggest that in most cases only a fraction of the available data was sufficient to produce a model close to the performance of one developed on the full data set, but with a substantially reduced model complexity.
We consider Bayesian inference for stochastic differential equation mixed effects models (SDEMEMs) exemplifying tumor response to treatment and regrowth in mice. We produce an extensive study on how a SDEMEM can be fitted using both exact inference based on pseudo-marginal MCMC and approximate inference via Bayesian synthetic likelihoods (BSL). We investigate a two-compartments SDEMEM, these corresponding to the fractions of tumor cells killed by and survived to a treatment, respectively. Case study data considers a tumor xenography study with two treatment groups and one control, each containing 5-8 mice. Results from the case study and from simulations indicate that the SDEMEM is able to reproduce the observed growth patterns and that BSL is a robust tool for inference in SDEMEMs. Finally, we compare the fit of the SDEMEM to a similar ordinary differential equation model. Due to small sample sizes, strong prior information is needed to identify all model parameters in the SDEMEM and it cannot be determined which of the two models is the better in terms of predicting tumor growth curves. In a simulation study we find that with a sample of 17 mice per group BSL is able to identify all model parameters and distinguish treatment groups.