No Arabic abstract
We discuss causal mediation analyses for survival data and propose a new approach based on the additive hazards model. The emphasis is on a dynamic point of view, that is, understanding how the direct and indirect effects develop over time. Hence, importantly, we allow for a time varying mediator. To define direct and indirect effects in such a longitudinal survival setting we take an interventional approach (Didelez (2018)) where treatment is separated into one aspect affecting the mediator and a different aspect affecting survival. In general, this leads to a version of the non-parametric g-formula (Robins (1986)). In the present paper, we demonstrate that combining the g-formula with the additive hazards model and a sequential linear model for the mediator process results in simple and interpretable expressions for direct and indirect effects in terms of relative survival as well as cumulative hazards. Our results generalise and formalise the method of dynamic path analysis (Fosen et al. (2006), Strohmaier et al. (2015)). An application to data from a clinical trial on blood pressure medication is given.
In the process of clinical diagnosis and treatment, the restricted mean survival time (RMST), which reflects the life expectancy of patients up to a specified time, can be used as an appropriate outcome measure. However, the RMST only calculates the mean survival time of patients within a period of time after the start of follow-up and may not accurately portray the change in a patients life expectancy over time. The life expectancy can be adjusted for the time the patient has already survived and defined as the conditional restricted mean survival time (cRMST). A dynamic RMST model based on the cRMST can be established by incorporating time-dependent covariates and covariates with time-varying effects. We analysed data from a study of primary biliary cirrhosis (PBC) to illustrate the use of the dynamic RMST model. The predictive performance was evaluated using the C-index and the prediction error. The proposed dynamic RMST model, which can explore the dynamic effects of prognostic factors on survival time, has better predictive performance than the RMST model. Three PBC patient examples were used to illustrate how the predicted cRMST changed at different prediction times during follow-up. The use of the dynamic RMST model based on the cRMST allows for optimization of evidence-based decision-making by updating personalized dynamic life expectancy for patients.
Estimating causal effects for survival outcomes in the high-dimensional setting is an extremely important topic for many biomedical applications as well as areas of social sciences. We propose a new orthogonal score method for treatment effect estimation and inference that results in asymptotically valid confidence intervals assuming only good estimation properties of the hazard outcome model and the conditional probability of treatment. This guarantee allows us to provide valid inference for the conditional treatment effect under the high-dimensional additive hazards model under considerably more generality than existing approaches. In addition, we develop a new Hazards Difference (HDi), estimator. We showcase that our approach has double-robustness properties in high dimensions: with cross-fitting, the HDi estimate is consistent under a wide variety of treatment assignment models; the HDi estimate is also consistent when the hazards model is misspecified and instead the true data generating mechanism follows a partially linear additive hazards model. We further develop a novel sparsity doubly robust result, where either the outcome or the treatment model can be a fully dense high-dimensional model. We apply our methods to study the treatment effect of radical prostatectomy versus conservative management for prostate cancer patients using the SEER-Medicare Linked Data.
We study explained variation under the additive hazards regression model for right-censored data. We consider different approaches for developing such a measure, and focus on one that estimates the proportion of variation in the failure time explained by the covariates. We study the properties of the measure both analytically, and through extensive simulations. We apply the measure to a well-known survival data set as well as the linked Surveillance, Epidemiology and End Results (SEER)-Medicare database for prediction of mortality in early-stage prostate cancer patients using high dimensional claims codes.
Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with mutation motifs, which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an $ell_1$-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates.
Online experimentation is at the core of Booking.coms customer-centric product development. While randomised controlled trials are a powerful tool for estimating the overall effects of product changes on business metrics, they often fall short in explaining the mechanism of change. This becomes problematic when decision-making depends on being able to distinguish between the direct effect of a treatment on some outcome variable and its indirect effect via a mediator variable. In this paper, we demonstrate the need for mediation analyses in online experimentation, and use simulated data to show how these methods help identify and estimate direct causal effect. Failing to take into account all confounders can lead to biased estimates, so we include sensitivity analyses to help gauge the robustness of estimates to missing causal factors.