No Arabic abstract
In randomized experiments, interactions between units might generate a treatment diffusion process. This is common when the treatment of interest is an actual object or product that can be shared among peers (e.g., flyers, booklets, videos). For instance, if the intervention of interest is an information campaign realized through the distribution of a video to targeted individuals, some of these treated individuals might share the video they received with their friends. Such a phenomenon is usually unobserved, causing a misallocation of individuals in the two treatment arms: some of the initially untreated units might have actually received the treatment by diffusion. Treatment misclassification can, in turn, introduce a bias in the estimation of the causal effect. Inspired by a recent field experiment on the effect of different types of school incentives aimed at encouraging students to attend cultural events, we present a novel approach to deal with a hidden diffusion process on observed or partially observed networks.Specifically, we develop a simulation-based sensitivity analysis that assesses the robustness of the estimates against the possible presence of a treatment diffusion. We simulate several diffusion scenarios within a plausible range of sensitivity parameters and we compare the treatment effect which is estimated in each scenario with the one that is obtained while ignoring the diffusion process. Results suggest that even a treatment diffusion parameter of small size may lead to a significant bias in the estimation of the treatment effect.
Real-world networks such as social and communication networks are too large to be observed entirely. Such networks are often partially observed such that network size, network topology, and nodes of the original network are unknown. In this paper we formalize the Adaptive Graph Exploring problem. We assume that we are given an incomplete snapshot of a large network and additional nodes can be discovered by querying nodes in the currently observed network. The goal of this problem is to maximize the number of observed nodes within a given query budget. Querying which set of nodes maximizes the size of the observed network? We formulate this problem as an exploration-exploitation problem and propose a novel nonparametric multi-arm bandit (MAB) algorithm for identifying which nodes to be queried. Our contributions include: (1) $i$KNN-UCB, a novel nonparametric MAB algorithm, applies $k$-nearest neighbor UCB to the setting when the arms are presented in a vector space, (2) provide theoretical guarantee that $i$KNN-UCB algorithm has sublinear regret, and (3) applying $i$KNN-UCB algorithm on synthetic networks and real-world networks from different domains, we show that our method discovers up to 40% more nodes compared to existing baselines.
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.
Irregular functional data in which densely sampled curves are observed over different ranges pose a challenge for modeling and inference, and sensitivity to outlier curves is a concern in applications. Motivated by applications in quantitative ultrasound signal analysis, this paper investigates a class of robust M-estimators for partially observed functional data including functional location and quantile estimators. Consistency of the estimators is established under general conditions on the partial observation process. Under smoothness conditions on the class of M-estimators, asymptotic Gaussian process approximations are established and used for large sample inference. The large sample approximations justify a bootstrap approximation for robust inferences about the functional response process. The performance is demonstrated in simulations and in the analysis of irregular functional data from quantitative ultrasound analysis.
Understanding how and how far information, behaviors, or pathogens spread in social networks is an important problem, having implications for both predicting the size of epidemics, as well as for planning effective interventions. There are, however, two main challenges for inferring spreading paths in real-world networks. One is the practical difficulty of observing a dynamic process on a network, and the other is the typical constraint of only partially observing a network. Using a static, structurally realistic social network as a platform for simulations, we juxtapose three distinct paths: (1) the stochastic path taken by a simulated spreading process from source to target; (2) the topologically shortest path in the fully observed network, and hence the single most likely stochastic path, between the two nodes; and (3) the topologically shortest path in a partially observed network. In a sampled network, how closely does the partially observed shortest path (3) emulate the unobserved spreading path (1)? Although partial observation inflates the length of the shortest path, the stochastic nature of the spreading process also frequently derails the dynamic path from the shortest path. We find that the partially observed shortest path does not necessarily give an inflated estimate of the length of the process path; in fact, partial observation may, counterintuitively, make the path seem shorter than it actually is.
Forest-based methods have recently gained in popularity for non-parametric treatment effect estimation. Building on this line of work, we introduce causal survival forests, which can be used to estimate heterogeneous treatment effects in a survival and observational setting where outcomes may be right-censored. Our approach relies on orthogonal estimating equations to robustly adjust for both censoring and selection effects. In our experiments, we find our approach to perform well relative to a number of baselines.