ﻻ يوجد ملخص باللغة العربية
Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors. In this paper, we develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors, leveraging a large unlabeled data on candidate predictors and surrogates of outcome, as well as a small labeled data with annotated outcomes. The SAS procedure borrows information from surrogates along with candidate predictors to impute the unobserved outcomes via a sparse working imputation model with moment conditions to achieve robustness against mis-specification in the imputation model and a one-step bias correction to enable interval estimation for the predicted risk. We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model, even when the underlying risk prediction model is dense and the risk model is mis-specified. We present an extensive simulation study to demonstrate the superiority of our SSL approach compared to existing supervised methods. We apply the method to derive genetic risk prediction of type-2 diabetes mellitus using a EHR biobank cohort.
We train and validate a semi-supervised, multi-task LSTM on 57,675 person-weeks of data from off-the-shelf wearable heart rate sensors, showing high accuracy at detecting multiple medical conditions, including diabetes (0.8451), high cholesterol (0.7
There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dimensionality.
Distance correlation has become an increasingly popular tool for detecting the nonlinear dependence between a pair of potentially high-dimensional random vectors. Most existing works have explored its asymptotic distributions under the null hypothesi
We study high-dimensional regression with missing entries in the covariates. A common strategy in practice is to emph{impute} the missing entries with an appropriate substitute and then implement a standard statistical procedure acting as if the cova
In this article, we consider the sparse tensor singular value decomposition, which aims for dimension reduction on high-dimensional high-order data with certain sparsity structure. A method named Sparse Tensor Alternating Thresholding for Singular Va