Diabetes Mellitus Forecasting Using Population Health Data in Ontario, Canada

338 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hamed Sadeghi

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Mathieu Ravaut - Hamed Sadeghi - Kin Kwan Leung

تطبيقات الإحصاء التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The datasets include demographics, lab measurements,drug benefits, healthcare system interactions, ambulatory and hospitalizations records. We perform one of the first large-scale machine learning studies with this data to study the task of predicting diabetes in a range of 1-10 years ahead, which requires no additional screening of individuals.In the best setup, we reach a test AUC of 80.3 with a single-model trained on an observation window of 5 years with a one-year buffer using all datasets. A subset of top 15 features alone (out of a total of 963) could provide a test AUC of 79.1. In this paper, we provide extensive machine learning model performance and feature contribution analysis, which enables us to narrow down to the most important features useful for diabetes forecasting. Examples include chronic conditions such as asthma and hypertension, lab results, diagnostic codes in insurance claims, age and geographical information.

قيم البحث

119 - Dingdong Yi , Shaoyang Ning , Chia-Jung Chang 2020

Big data generated from the Internet offer great potential for predictive analysis. Here we focus on using online users Internet search data to forecast unemployment initial claims weeks into the future, which provides timely insights into the direct ion of the economy. To this end, we present a novel method PRISM (Penalized Regression with Inferred Seasonality Module), which uses publicly available online search data from Google. PRISM is a semi-parametric method, motivated by a general state-space formulation, and employs nonparametric seasonal decomposition and penalized regression. For forecasting unemployment initial claims, PRISM outperforms all previously available methods, including forecasting during the 2008-2009 financial crisis period and near-future forecasting during the COVID-19 pandemic period, when unemployment initial claims both rose rapidly. The timely and accurate unemployment forecasts by PRISM could aid government agencies and financial institutions to assess the economic trend and make well-informed decisions, especially in the face of economic turbulence.

تطبيقات الإحصاء المنهجية

Malaria Risk Mapping Using Routine Health System Incidence Data in Zambia

66 - Benjamin M. Taylor , Ricardo Andrade-Pacheco , Hugh Sturrock andn Busiku Hamainza 2021

Improvements to Zambias malaria surveillance system allow better monitoring of incidence and targetting of responses at refined spatial scales. As transmission decreases, understanding heterogeneity in risk at fine spatial scales becomes increasingly important. However, there are challenges in using health system data for high-resolution risk mapping: health facilities have undefined and overlapping catchment areas, and report on an inconsistent basis. We propose a novel inferential framework for risk mapping of malaria incidence data based on formal down-scaling of confirmed case data reported through the health system in Zambia. We combine data from large community intervention trials in 2011-2016 and model health facility catchments based upon treatment-seeking behaviours; our model for monthly incidence is an aggregated log-Gaussian Cox process, which allows us to predict incidence at fine scale. We predicted monthly malaria incidence at 5km$^2$ resolution nationally: whereas 4.8 million malaria cases were reported through the health system in 2016, we estimated that the number of cases occurring at the community level was closer to 10 million. As Zambia continues to scale up community-based reporting of malaria incidence, these outputs provide realistic estimates of community-level malaria burden as well as high resolution risk maps for targeting interventions at the sub-catchment level.

تطبيقات الإحصاء

Forecasting COVID-19 daily cases using phone call data

329 - Bahman Rostami-Tabar , Juan F. Rendon-Sanchez 2020

The need to forecast COVID-19 related variables continues to be pressing as the epidemic unfolds. Different efforts have been made, with compartmental models in epidemiology and statistical models such as AutoRegressive Integrated Moving Average (ARI MA), Exponential Smoothing (ETS) or computing intelligence models. These efforts have proved useful in some instances by allowing decision makers to distinguish different scenarios during the emergency, but their accuracy has been disappointing, forecasts ignore uncertainties and less attention is given to local areas. In this study, we propose a simple Multiple Linear Regression model, optimised to use call data to forecast the number of daily confirmed cases. Moreover, we produce a probabilistic forecast that allows decision makers to better deal with risk. Our proposed approach outperforms ARIMA, ETS and a regression model without call data, evaluated by three point forecast error metrics, one prediction interval and two probabilistic forecast accuracy measures. The simplicity, interpretability and reliability of the model, obtained in a careful forecasting exercise, is a meaningful contribution to decision makers at local level who acutely need to organise resources in already strained health services. We hope that this model would serve as a building block of other forecasting efforts that on the one hand would help front-line personal and decision makers at local level, and on the other would facilitate the communication with other modelling efforts being made at the national level to improve the way we tackle this pandemic and other similar future challenges.

تطبيقات الإحصاء أجهزة الكمبيوتر والمجتمع

Forecasting Electricity Smart Meter Data Using Conditional Kernel Density Estimation

278 - Siddharth Arora , James W. Taylor 2014

The recent advent of smart meters has led to large micro-level datasets. For the first time, the electricity consumption at individual sites is available on a near real-time basis. Efficient management of energy resources, electric utilities, and tra nsmission grids, can be greatly facilitated by harnessing the potential of this data. The aim of this study is to generate probability density estimates for consumption recorded by individual smart meters. Such estimates can assist decision making by helping consumers identify and minimize their excess electricity usage, especially during peak times. For suppliers, these estimates can be used to devise innovative time-of-use pricing strategies aimed at their target consumers. We consider methods based on conditional kernel density (CKD) estimation with the incorporation of a decay parameter. The methods capture the seasonality in consumption, and enable a nonparametric estimation of its conditional density. Using eight months of half-hourly data for one thousand meters, we evaluate point and density forecasts, for lead times ranging from one half-hour up to a week ahead. We find that the kernel-based methods outperform a simple benchmark method that does not account for seasonality, and compare well with an exponential smoothing method that we use as a sophisticated benchmark. To gauge the financial impact, we use density estimates of consumption to derive prediction intervals of electricity cost for different time-of-use tariffs. We show that a simple strategy of switching between different tariffs, based on a comparison of cost densities, delivers significant cost savings for the great majority of consumers.

تطبيقات الإحصاء

Behavioral-clinical phenotyping with type 2 diabetes self-monitoring data

323 - Matthew E. Levine , David J. Albers , Marissa Burgermaster 2018

Objective: To evaluate unsupervised clustering methods for identifying individual-level behavioral-clinical phenotypes that relate personal biomarkers and behavioral traits in type 2 diabetes (T2DM) self-monitoring data. Materials and Methods: We use d hierarchical clustering (HC) to identify groups of meals with similar nutrition and glycemic impact for 6 individuals with T2DM who collected self-monitoring data. We evaluated clusters on: 1) correspondence to gold standards generated by certified diabetes educators (CDEs) for 3 participants; 2) face validity, rated by CDEs, and 3) impact on CDEs ability to identify patterns for another 3 participants. Results: Gold standard (GS) included 9 patterns across 3 participants. Of these, all 9 were re-discovered using HC: 4 GS patterns were consistent with patterns identified by HC (over 50% of meals in a cluster followed the pattern); another 5 were included as sub-groups in broader clusers. 50% (9/18) of clusters were rated over 3 on 5-point Likert scale for validity, significance, and being actionable. After reviewing clusters, CDEs identified patterns that were more consistent with data (70% reduction in contradictions between patterns and participants records). Discussion: Hierarchical clustering of blood glucose and macronutrient consumption appears suitable for discovering behavioral-clinical phenotypes in T2DM. Most clusters corresponded to gold standard and were rated positively by CDEs for face validity. Cluster visualizations helped CDEs identify more robust patterns in nutrition and glycemic impact, creating new possibilities for visual analytic solutions. Conclusion: Machine learning methods can use diabetes self-monitoring data to create personalized behavioral-clinical phenotypes, which may prove useful for delivering personalized medicine.

تطبيقات الإحصاء التعلم الالي إحصاء