No Arabic abstract
Predicting risks of chronic diseases has become increasingly important in clinical practice. When a prediction model is developed in a given source cohort, there is often a great interest to apply the model to other cohorts. However, due to potential discrepancy in baseline disease incidences between different cohorts and shifts in patient composition, the risk predicted by the original model often under- or over-estimates the risk in the new cohort. The remedy of such a poorly calibrated prediction is needed for proper medical decision-making. In this article, we assume the relative risks of predictors are the same between the two cohorts, and propose a novel weighted estimating equation approach to re-calibrating the projected risk for the targeted population through updating the baseline risk. The recalibration leverages the knowledge about the overall survival probabilities for the disease of interest and competing events, and the summary information of risk factors from the targeted population. The proposed re-calibrated risk estimators gain efficiency if the risk factor distributions are the same for both the source and target cohorts, and are robust with little bias if they differ. We establish the consistency and asymptotic normality of the proposed estimators. Extensive simulation studies demonstrate that the proposed estimators perform very well in terms of robustness and efficiency in finite samples. A real data application to colorectal cancer risk prediction also illustrates that the proposed method can be used in practice for model recalibration.
High-throughput microarray and sequencing technology have been used to identify disease subtypes that could not be observed otherwise by using clinical variables alone. The classical unsupervised clustering strategy concerns primarily the identification of subpopulations that have similar patterns in gene features. However, as the features corresponding to irrelevant confounders (e.g. gender or age) may dominate the clustering process, the resulting clusters may or may not capture clinically meaningful disease subtypes. This gives rise to a fundamental problem: can we find a subtyping procedure guided by a pre-specified disease outcome? Existing methods, such as supervised clustering, apply a two-stage approach and depend on an arbitrary number of selected features associated with outcome. In this paper, we propose a unified latent generative model to perform outcome-guided disease subtyping constructed from omics data, which improves the resulting subtypes concerning the disease of interest. Feature selection is embedded in a regularization regression. A modified EM algorithm is applied for numerical computation and parameter estimation. The proposed method performs feature selection, latent subtype characterization and outcome prediction simultaneously. To account for possible outliers or violation of mixture Gaussian assumption, we incorporate robust estimation using adaptive Huber or median-truncated loss function. Extensive simulations and an application to complex lung diseases with transcriptomic and clinical data demonstrate the ability of the proposed method to identify clinically relevant disease subtypes and signature genes suitable to explore toward precision medicine.
Network meta-analysis (NMA) allows the combination of direct and indirect evidence from a set of randomized clinical trials. Performing NMA using individual patient data (IPD) is considered as a gold standard approach as it provides several advantages over NMA based on aggregate data. For example, it allows to perform advanced modelling of covariates or covariate-treatment interactions. An important issue in IPD NMA is the selection of influential parameters among terms that account for inconsistency, covariates, covariate-by-treatment interactions or non-proportionality of treatments effect for time to event data. This issue has not been deeply studied in the literature yet and in particular not for time-to-event data. A major difficulty is to jointly account for between-trial heterogeneity which could have a major influence on the selection process. The use of penalized generalized mixed effect model is a solution, but existing implementations have several shortcomings and an important computational cost that precludes their use for complex IPD NMA. In this article, we propose a penalized Poisson regression model to perform IPD NMA of time-to-event data. It is based only on fixed effect parameters which improve its computational cost over the use of random effects. It could be easily implemented using existing penalized regression package. Computer code is shared for implementation. The methods were applied on simulated data to illustrate the importance to take into account between trial heterogeneity during the selection procedure. Finally, it was applied to an IPD NMA of overall survival of chemotherapy and radiotherapy in nasopharyngeal carcinoma.
This paper introduces a unified framework of counterfactual estimation for time-series cross-sectional data, which estimates the average treatment effect on the treated by directly imputing treated counterfactuals. Examples include the fixed effects counterfactual estimator, interactive fixed effects counterfactual estimator, and matrix completion estimator. These estimators provide more reliable causal estimates than conventional twoway fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Under this framework, we propose a new dynamic treatment effects plot, as well as several diagnostic tests, to help researchers gauge the validity of the identifying assumptions. We illustrate these methods with two political economy examples and develop an open-source package, fect, in both R and Stata to facilitate implementation.
We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity and long range dependence ruling out models based on Poisson processes. The proposed class of FRActional Probit (FRAP) models is based on thresholding of a latent process consisting of an additive expansion of a smooth Gaussian process with a fractional Brownian motion. We develop a Bayesian approach to inference using Markov chain Monte Carlo, and show good performance in simulation studies. Applying the methods to the Amazon bird vocalization data, we find substantial evidence for self-similarity and non-Markovian/Poisson dynamics. To accommodate the bird vocalization data, in which there are many different species of birds exhibiting their own vocalization dynamics, a hierarchical expansion of FRAP is provided in Supplementary Materials.
Use copula to model dependency of variable extends multivariate gaussian assumption. In this paper we first empirically studied copula regression model with continous response. Both simulation study and real data study are given. Secondly we give a novel copula regression model with binary outcome, and we propose a score gradient estimation algorithms to fit the model. Both simulation study and real data study are given for our model and fitting algorithm.