No Arabic abstract
Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the most commonly employed models. We describe a new approach for survival analysis regression models, based on learning mixtures of Cox regressions to model individual survival distributions. We propose an approximation to the Expectation Maximization algorithm for this model that does hard assignments to mixture groups to make optimization efficient. In each group assignment, we fit the hazard ratios within each group using deep neural networks, and the baseline hazard for each mixture component non-parametrically. We perform experiments on multiple real world datasets, and look at the mortality rates of patients across ethnicity and gender. We emphasize the importance of calibration in healthcare settings and demonstrate that our approach outperforms classical and modern survival analysis baselines, both in terms of discriminative performance and calibration, with large gains in performance on the minority demographics.
We apply Gaussian process (GP) regression, which provides a powerful non-parametric probabilistic method of relating inputs to outputs, to survival data consisting of time-to-event and covariate measurements. In this context, the covariates are regarded as the `inputs and the event times are the `outputs. This allows for highly flexible inference of non-linear relationships between covariates and event times. Many existing methods, such as the ubiquitous Cox proportional hazards model, focus primarily on the hazard rate which is typically assumed to take some parametric or semi-parametric form. Our proposed model belongs to the class of accelerated failure time models where we focus on directly characterising the relationship between covariates and event times without any explicit assumptions on what form the hazard rates take. It is straightforward to include various types and combinations of censored and truncated observations. We apply our approach to both simulated and experimental data. We then apply multiple output GP regression, which can handle multiple potentially correlated outputs for each input, to competing risks survival data where multiple event types can occur. By tuning one of the model parameters we can control the extent to which the multiple outputs (the time-to-event for each risk) are dependent thus allowing the specification of correlated risks. Simulation studies suggest that in some cases assuming dependence can lead to more accurate predictions.
Motion analysis is used in computer vision to understand the behaviour of moving objects in sequences of images. Optimising the interpretation of dynamic biological systems requires accurate and precise motion tracking as well as efficient representations of high-dimensional motion trajectories so that these can be used for prediction tasks. Here we use image sequences of the heart, acquired using cardiac magnetic resonance imaging, to create time-resolved three-dimensional segmentations using a fully convolutional network trained on anatomical shape priors. This dense motion model formed the input to a supervised denoising autoencoder (4Dsurvival), which is a hybrid network consisting of an autoencoder that learns a task-specific latent code representation trained on observed outcome data, yielding a latent representation optimised for survival prediction. To handle right-censored survival outcomes, our network used a Cox partial likelihood loss function. In a study of 302 patients the predictive accuracy (quantified by Harrells C-index) was significantly higher (p < .0001) for our model C=0.73 (95$%$ CI: 0.68 - 0.78) than the human benchmark of C=0.59 (95$%$ CI: 0.53 - 0.65). This work demonstrates how a complex computer vision task using high-dimensional medical image data can efficiently predict human survival.
This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.
Point processes in time have a wide range of applications that include the claims arrival process in insurance or the analysis of queues in operations research. Due to advances in technology, such samples of point processes are increasingly encountered. A key object of interest is the local intensity function. It has a straightforward interpretation that allows to understand and explore point process data. We consider functional approaches for point processes, where one has a sample of repeated realizations of the point process. This situation is inherently connected with Cox processes, where the intensity functions of the replications are modeled as random functions. Here we study a situation where one records covariates for each replication of the process, such as the daily temperature for bike rentals. For modeling point processes as responses with vector covariates as predictors we propose a novel regression approach for the intensity function that is intrinsically nonparametric. While the intensity function of a point process that is only observed once on a fixed domain cannot be identified, we show how covariates and repeated observations of the process can be utilized to make consistent estimation possible, and we also derive asymptotic rates of convergence without invoking parametric assumptions.
Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined with deep learning (DL) methods, it should be scalable and affect the DL model performance minimally. Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. The few available frequentist methods are rarely discriminative and/or violate coverage guarantees due to unrealistic assumptions. Moreover, many methods are expensive or require substantial modifications to the base neural network. Building upon recent advances in conformal prediction [13, 31] and leveraging the classical idea of kernel regression, we propose Locally Valid and Discriminative predictive intervals (LVD), a simple, efficient, and lightweight method to construct discriminative predictive intervals (PIs) for almost any DL model. With no assumptions on the data distribution, such PIs also offer finite-sample local coverage guarantees (contrasted to the simpler marginal coverage). We empirically verify, using diverse datasets, that besides being the only locally valid method, LVD also exceeds or matches the performance (including coverage rate and prediction accuracy) of existing uncertainty quantification methods, while offering additional benefits in scalability and flexibility.