ﻻ يوجد ملخص باللغة العربية
Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The datasets include demographics, lab measurements,drug benefits, healthcare system interactions, ambulatory and hospitalizations records. We perform one of the first large-scale machine learning studies with this data to study the task of predicting diabetes in a range of 1-10 years ahead, which requires no additional screening of individuals.In the best setup, we reach a test AUC of 80.3 with a single-model trained on an observation window of 5 years with a one-year buffer using all datasets. A subset of top 15 features alone (out of a total of 963) could provide a test AUC of 79.1. In this paper, we provide extensive machine learning model performance and feature contribution analysis, which enables us to narrow down to the most important features useful for diabetes forecasting. Examples include chronic conditions such as asthma and hypertension, lab results, diagnostic codes in insurance claims, age and geographical information.
Big data generated from the Internet offer great potential for predictive analysis. Here we focus on using online users Internet search data to forecast unemployment initial claims weeks into the future, which provides timely insights into the direct
Improvements to Zambias malaria surveillance system allow better monitoring of incidence and targetting of responses at refined spatial scales. As transmission decreases, understanding heterogeneity in risk at fine spatial scales becomes increasingly
The need to forecast COVID-19 related variables continues to be pressing as the epidemic unfolds. Different efforts have been made, with compartmental models in epidemiology and statistical models such as AutoRegressive Integrated Moving Average (ARI
The recent advent of smart meters has led to large micro-level datasets. For the first time, the electricity consumption at individual sites is available on a near real-time basis. Efficient management of energy resources, electric utilities, and tra
Objective: To evaluate unsupervised clustering methods for identifying individual-level behavioral-clinical phenotypes that relate personal biomarkers and behavioral traits in type 2 diabetes (T2DM) self-monitoring data. Materials and Methods: We use