A Bayesian Approach to Modelling Longitudinal Data in Electronic Health Records

102 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alexis Bellot

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Alexis Bellot - Mihaela van der Schaar

التعلم الالي التعلم الآلي تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Analyzing electronic health records (EHR) poses significant challenges because often few samples are available describing a patients health and, when available, their information content is highly diverse. The problem we consider is how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status and fixed demographic information to produce estimated survival distributions updated through a patients follow up. We propose a nonparametric probabilistic model that generates survival trajectories from an ensemble of Bayesian trees that learns variable interactions over time without specifying beforehand the longitudinal process. We show performance improvements on Primary Biliary Cirrhosis patient data.

قيم البحث

61 - Daisy Yi Ding , Chloe Simpson , Stephen Pfohl 2018

Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervis ed learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aims to improve model performance on a target task by jointly learning additional auxiliary tasks and has been used in disparate areas of machine learning. However, its utility when applied to EHR data has not been established, and prior work suggests that its benefits are inconsistent. We present experiments that elucidate when multitask learning with neural nets improves performance for phenotyping using EHR data relative to neural nets trained for a single phenotype and to well-tuned logistic regression baselines. We find that multitask neural nets consistently outperform single-task neural nets for rare phenotypes but underperform for relatively more common phenotypes. The effect size increases as more auxiliary tasks are added. Moreover, multitask learning reduces the sensitivity of neural nets to hyperparameter settings for rare phenotypes. Last, we quantify phenotype complexity and find that neural nets trained with or without multitask learning do not improve on simple baselines unless the phenotypes are sufficiently complex.

التعلم الالي التعلم الآلي

Modeling Heterogeneity and Missing Data of Multiple Longitudinal Outcomes in Electronic Health Records

235 - Rebecca Anthopolos , Ying Wei , Qixuan Chen 2021

In electronic health records (EHRs), latent subgroups of patients may exhibit distinctive patterning in their longitudinal health trajectories. For such data, growth mixture models (GMMs) enable classifying patients into different latent classes base d on individual trajectories and hypothesized risk factors. However, the application of GMMs is hindered by the special missing data problem in EHRs, which manifests two patient-led missing data processes: the visit process and the response process for an EHR variable conditional on a patient visiting the clinic. If either process is associated with the process generating the longitudinal outcomes, then valid inferences require accounting for a nonignorable missing data mechanism. We propose a Bayesian shared parameter model that links GMMs of multiple longitudinal health outcomes, the visit process, and the response process of each outcome given a visit using a discrete latent class variable. Our focus is on multiple longitudinal health outcomes for which there can be a clinically prescribed visit schedule. We demonstrate our model in EHR measurements on early childhood weight and height z-scores. Using data simulations, we illustrate the statistical properties of our method with respect to subgroup-specific or marginal inferences. We built the R package EHRMiss for model fitting, selection, and checking.

المنهجية تطبيقات الإحصاء

Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

345 - Abdelaali Hassaine , Dexter Canoy , Jose Roberto Ayala Solares 2019

Multimorbidity, or the presence of several medical conditions in the same individual, has been increasing in the population, both in absolute and relative terms. However, multimorbidity remains poorly understood, and the evidence from existing resear ch to describe its burden, determinants and consequences has been limited. Previous studies attempting to understand multimorbidity patterns are often cross-sectional and do not explicitly account for multimorbidity patterns evolution over time; some of them are based on small datasets and/or use arbitrary and narrow age ranges; and those that employed advanced models, usually lack appropriate benchmarking and validations. In this study, we (1) introduce a novel approach for using Non-negative Matrix Factorisation (NMF) for temporal phenotyping (i.e., simultaneously mining disease clusters and their trajectories); (2) provide quantitative metrics for the evaluation of disease clusters from such studies; and (3) demonstrate how the temporal characteristics of the disease clusters that result from our model can help mine multimorbidity networks and generate new hypotheses for the emergence of various multimorbidity patterns over time. We trained and evaluated our models on one of the worlds largest electronic health records (EHR), with 7 million patients, from which over 2 million where relevant to this study.

التعلم الالي التعلم الآلي

Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

408 - Yikuan Li , Shishir Rao , Abdelaali Hassaine 2020

One major impediment to the wider use of deep learning for clinical decision making is the difficulty of assigning a level of confidence to model predictions. Currently, deep Bayesian neural networks and sparse Gaussian processes are the main two sca lable uncertainty estimation methods. However, deep Bayesian neural network suffers from lack of expressiveness, and more expressive models such as deep kernel learning, which is an extension of sparse Gaussian process, captures only the uncertainty from the higher level latent space. Therefore, the deep learning model under it lacks interpretability and ignores uncertainty from the raw data. In this paper, we merge features of the deep Bayesian learning framework with deep kernel learning to leverage the strengths of both methods for more comprehensive uncertainty estimation. Through a series of experiments on predicting the first incidence of heart failure, diabetes and depression applied to large-scale electronic medical records, we demonstrate that our method is better at capturing uncertainty than both Gaussian processes and deep Bayesian neural networks in terms of indicating data insufficiency and distinguishing true positive and false positive predictions, with a comparable generalisation performance. Furthermore, by assessing the accuracy and area under the receiver operating characteristic curve over the predictive probability, we show that our method is less susceptible to making overconfident predictions, especially for the minority class in imbalanced datasets. Finally, we demonstrate how uncertainty information derived by the model can inform risk factor analysis towards model interpretability.

التعلم الآلي التعلم الالي

Scalable and accurate deep learning for electronic health records

111 - Alvin Rajkomar , Eyal Oren , Kai Chen 2018

Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from no rmalized EHR data, a labor-intensive process that discards the vast majority of information in each patients record. We propose a representation of patients entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patients final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patients chart.

أجهزة الكمبيوتر والمجتمع التعلم الآلي