No Arabic abstract
Survival analysis in the presence of multiple possible adverse events, i.e., competing risks, is a pervasive problem in many industries (healthcare, finance, etc.). Since only one event is typically observed, the incidence of an event of interest is often obscured by other related competing events. This nonidentifiability, or inability to estimate true cause-specific survival curves from empirical data, further complicates competing risk survival analysis. We introduce Siamese Survival Prognosis Network (SSPN), a novel deep learning architecture for estimating personalized risk scores in the presence of competing risks. SSPN circumvents the nonidentifiability problem by avoiding the estimation of cause-specific survival curves and instead determines pairwise concordant time-dependent risks, where longer event times are assigned lower risks. Furthermore, SSPN is able to directly optimize an approximation to the C-discrimination index, rather than relying on well-known metrics which are unable to capture the unique requirements of survival analysis with competing risks.
We apply Gaussian process (GP) regression, which provides a powerful non-parametric probabilistic method of relating inputs to outputs, to survival data consisting of time-to-event and covariate measurements. In this context, the covariates are regarded as the `inputs and the event times are the `outputs. This allows for highly flexible inference of non-linear relationships between covariates and event times. Many existing methods, such as the ubiquitous Cox proportional hazards model, focus primarily on the hazard rate which is typically assumed to take some parametric or semi-parametric form. Our proposed model belongs to the class of accelerated failure time models where we focus on directly characterising the relationship between covariates and event times without any explicit assumptions on what form the hazard rates take. It is straightforward to include various types and combinations of censored and truncated observations. We apply our approach to both simulated and experimental data. We then apply multiple output GP regression, which can handle multiple potentially correlated outputs for each input, to competing risks survival data where multiple event types can occur. By tuning one of the model parameters we can control the extent to which the multiple outputs (the time-to-event for each risk) are dependent thus allowing the specification of correlated risks. Simulation studies suggest that in some cases assuming dependence can lead to more accurate predictions.
Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.
No unmeasured confounding is often assumed in estimating treatment effects in observational data when using approaches such as propensity scores and inverse probability weighting. However, in many such studies due to the limitation of the databases, collected confounders are not exhaustive, and it is crucial to examine the extent to which the resulting estimate is sensitive to the unmeasured confounders. We consider this problem for survival and competing risks data. Due to the complexity of models for such data, we adapt the simulated potential confounders approach of Carnegie et al. (2016), which provides a general tool for sensitivity analysis due to unmeasured confounding. More specifically, we specify one sensitivity parameter to quantify the association between an unmeasured confounder and the treatment assignment, and another set of parameters to quantify the association between the confounder and the time-to-event outcomes. By varying the magnitudes of the sensitivity parameters, we estimate the treatment effect of interest using the stochastic EM and the EM algorithms. We demonstrate the performance of our methods on simulated data, and apply them to a comparative effectiveness study in inflammatory bowel disease (IBD).
Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis. These methods involve fitting of the log-proportional hazard as a function of the covariates and are convenient as they do not require estimation of the baseline hazard rate. Recent approaches have involved learning non-linear representations of the input covariates and demonstrate improved performance. In this paper we argue against such deep parameterizations for survival analysis and experimentally demonstrate that more interpretable semi-parametric models inspired from mixtures of experts perform equally well or in some cases better than such overly parameterized deep models.
Black Sigatoka disease severely decreases global banana production, and climate change aggravates the problem by altering fungal species distributions. Due to the heavy financial burden of managing this infectious disease, farmers in developing countries face significant banana crop losses. Though scientists have produced mathematical models of infectious diseases, adapting these models to incorporate climate effects is difficult. We present MR. NODE (Multiple predictoR Neural ODE), a neural network that models the dynamics of black Sigatoka infection learnt directly from data via Neural Ordinary Differential Equations. Our method encodes external predictor factors into the latent space in addition to the variable that we infer, and it can also predict the infection risk at an arbitrary point in time. Empirically, we demonstrate on historical climate data that our method has superior generalization performance on time points up to one month in the future and unseen irregularities. We believe that our method can be a useful tool to control the spread of black Sigatoka.