No Arabic abstract
No unmeasured confounding is often assumed in estimating treatment effects in observational data when using approaches such as propensity scores and inverse probability weighting. However, in many such studies due to the limitation of the databases, collected confounders are not exhaustive, and it is crucial to examine the extent to which the resulting estimate is sensitive to the unmeasured confounders. We consider this problem for survival and competing risks data. Due to the complexity of models for such data, we adapt the simulated potential confounders approach of Carnegie et al. (2016), which provides a general tool for sensitivity analysis due to unmeasured confounding. More specifically, we specify one sensitivity parameter to quantify the association between an unmeasured confounder and the treatment assignment, and another set of parameters to quantify the association between the confounder and the time-to-event outcomes. By varying the magnitudes of the sensitivity parameters, we estimate the treatment effect of interest using the stochastic EM and the EM algorithms. We demonstrate the performance of our methods on simulated data, and apply them to a comparative effectiveness study in inflammatory bowel disease (IBD).
Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We build on the work of Franks et al. (2019)and Robins (2000) by specifying non-identified sensitivity parameters that govern a contrast between the conditional (on measured covariates) distributions of the outcome under treatment (control) between treated and untreated individuals. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step bias-corrected estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has root-n asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study.
Survival analysis in the presence of multiple possible adverse events, i.e., competing risks, is a pervasive problem in many industries (healthcare, finance, etc.). Since only one event is typically observed, the incidence of an event of interest is often obscured by other related competing events. This nonidentifiability, or inability to estimate true cause-specific survival curves from empirical data, further complicates competing risk survival analysis. We introduce Siamese Survival Prognosis Network (SSPN), a novel deep learning architecture for estimating personalized risk scores in the presence of competing risks. SSPN circumvents the nonidentifiability problem by avoiding the estimation of cause-specific survival curves and instead determines pairwise concordant time-dependent risks, where longer event times are assigned lower risks. Furthermore, SSPN is able to directly optimize an approximation to the C-discrimination index, rather than relying on well-known metrics which are unable to capture the unique requirements of survival analysis with competing risks.
The health impact of long-term exposure to air pollution is now routinely estimated using spatial ecological studies, due to the recent widespread availability of spatial referenced pollution and disease data. However, this areal unit study design presents a number of statistical challenges, which if ignored have the potential to bias the estimated pollution-health relationship. One such challenge is how to control for the spatial autocorrelation present in the data after accounting for the known covariates, which is caused by unmeasured confounding. A second challenge is how to adjust the functional form of the model to account for the spatial misalignment between the pollution and disease data, which causes within-area variation in the pollution data. These challenges have largely been ignored in existing long-term spatial air pollution and health studies, so here we propose a novel Bayesian hierarchical model that addresses both challenges, and provide software to allow others to apply our model to their own data. The effectiveness of the proposed model is compared by simulation against a number of state of the art alternatives proposed in the literature, and is then used to estimate the impact of nitrogen dioxide and particulate matter concentrations on respiratory hospital admissions in a new epidemiological study in England in 2010 at the Local Authority level.
Data-driven individualized decision making has recently received increasing research interests. Most existing methods rely on the assumption of no unmeasured confounding, which unfortunately cannot be ensured in practice especially in observational studies. Motivated by the recent proposed proximal causal inference, we develop several proximal learning approaches to estimating optimal individualized treatment regimes (ITRs) in the presence of unmeasured confounding. In particular, we establish several identification results for different classes of ITRs, exhibiting the trade-off between the risk of making untestable assumptions and the value function improvement in decision making. Based on these results, we propose several classification-based approaches to finding a variety of restricted in-class optimal ITRs and develop their theoretical properties. The appealing numerical performance of our proposed methods is demonstrated via an extensive simulation study and one real data application.
We apply Gaussian process (GP) regression, which provides a powerful non-parametric probabilistic method of relating inputs to outputs, to survival data consisting of time-to-event and covariate measurements. In this context, the covariates are regarded as the `inputs and the event times are the `outputs. This allows for highly flexible inference of non-linear relationships between covariates and event times. Many existing methods, such as the ubiquitous Cox proportional hazards model, focus primarily on the hazard rate which is typically assumed to take some parametric or semi-parametric form. Our proposed model belongs to the class of accelerated failure time models where we focus on directly characterising the relationship between covariates and event times without any explicit assumptions on what form the hazard rates take. It is straightforward to include various types and combinations of censored and truncated observations. We apply our approach to both simulated and experimental data. We then apply multiple output GP regression, which can handle multiple potentially correlated outputs for each input, to competing risks survival data where multiple event types can occur. By tuning one of the model parameters we can control the extent to which the multiple outputs (the time-to-event for each risk) are dependent thus allowing the specification of correlated risks. Simulation studies suggest that in some cases assuming dependence can lead to more accurate predictions.