بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

347 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chieh Wu T

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Shi Dong - Zlatan Feric - Guangyu Li

تطبيقات الإحصاء المنهجية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear Aggregation (CLA), Weighted Mean Aggregation (WMA), Feature Occurrence Frequency (OFA), and Classification Accuracy Based Aggregation (CAA). To further address missing data present in each feature, we propose two novel methods: 1) Missing Data Rate and Accuracy Based Aggregation (MAA), and 2) Entropy and Accuracy Based Aggregation (EAA). Both proposed models balance the degree of data variance introduced by the missing data handling during the feature selection process while maintaining model performance. Our results show a 42% improvement in sensitivity versus fallout over previous state-of-the-art methods.

قيم البحث

84 - Lea Friedli , David Ginsbourger , Jonas Bhend 2019

Probabilistic weather forecasts from ensemble systems require statistical postprocessing to yield calibrated and sharp predictive distributions. This paper presents an area-covering postprocessing method for ensemble precipitation predictions. We rel y on the ensemble model output statistics (EMOS) approach, which generates probabilistic forecasts with a parametric distribution whose parameters depend on (statistics of) the ensemble prediction. A case study with daily precipitation predictions across Switzerland highlights that postprocessing at observation locations indeed improves high-resolution ensemble forecasts, with 4.5% CRPS reduction on average in the case of a lead time of 1 day. Our main aim is to achieve such an improvement without binding the model to stations, by leveraging topographical covariates. Specifically, regression coefficients are estimated by weighting the training data in relation to the topographical similarity between their station of origin and the prediction location. In our case study, this approach is found to reproduce the performance of the local model without using local historical data for calibration. We further identify that one key difficulty is that postprocessing often degrades the performance of the ensemble forecast during summer and early autumn. To mitigate, we additionally estimate on the training set whether postprocessing at a specific location is expected to improve the prediction. If not, the direct model output is used. This extension reduces the CRPS of the topographical model by up to another 1.7% on average at the price of a slight degradation in calibration. In this case, the highest improvement is achieved for a lead time of 4 days.

تطبيقات الإحصاء المنهجية

Preterm Birth Analysis Using Nonlinear Methods (a preliminary study)

313 - Tijana T. Ivancevic , Lakhmi C. Jain , John Pattison 2008

In this report we review modern nonlinearity methods that can be used in the preterm birth analysis. The nonlinear analysis of uterine contraction signals can provide information regarding physiological changes during the menstrual cycle and pregnanc y. This information can be used both for the preterm birth prediction and the preterm labor control. Keywords: preterm birth, complex data analysis, nonlinear methods

الأساليب الكمية

Learning to Identify Electrons

114 - Julian Collado , Jessica N. Howard , Taylor Faucett 2020

We investigate whether state-of-the-art classification features commonly used to distinguish electrons from jet backgrounds in collider experiments are overlooking valuable information. A deep convolutional neural network analysis of electromagnetic and hadronic calorimeter deposits is compared to the performance of typical features, revealing a $approx 5%$ gap which indicates that these lower-level data do contain untapped classification power. To reveal the nature of this unused information, we use a recently developed technique to map the deep network into a space of physically interpretable observables. We identify two simple calorimeter observables which are not typically used for electron identification, but which mimic the decisions of the convolutional network and nearly close the performance gap.

تحليل البيانات والإحصاءات والاحتمال فيزياء الطاقة العالية - التجربة فيزياء الطاقة العالية - الظواهر

Using Multiple Pre-treatment Periods to Improve Difference-in-Differences and Staggered Adoption Design

64 - Naoki Egami , Soichiro Yamauchi 2021

While difference-in-differences (DID) was originally developed with one pre- and one post-treatment periods, data from additional pre-treatment periods is often available. How can researchers improve the DID design with such multiple pre-treatment pe riods under what conditions? We first use potential outcomes to clarify three benefits of multiple pre-treatment periods: (1) assessing the parallel trends assumption, (2) improving estimation accuracy, and (3) allowing for a more flexible parallel trends assumption. We then propose a new estimator, double DID, which combines all the benefits through the generalized method of moments and contains the two-way fixed effects regression as a special case. In a wide range of applications where several pre-treatment periods are available, the double DID improves upon the standard DID both in terms of identification and estimation accuracy. We also generalize the double DID to the staggered adoption design where different units can receive the treatment in different time periods. We illustrate the proposed method with two empirical applications, covering both the basic DID and staggered adoption designs. We offer an open-source R package that implements the proposed methodologies.

تطبيقات الإحصاء المنهجية

Success Factors Contributing to eGovernment Adoption in Saudi Arabia: G2C approach

315 - Ibrahim Abunadi , Louis Sanzogni , Kuldeep Sandhu 2012

Saudi Arabia is predetermined to implement eGovernment and provide world-class government services to citizens by 2010. However, this initiative will be meaningless if the people did not adopt these electronic services. Therefore, the purpose of this study is to determine success factors that will facilitate the adoption of eGovernment in Saudi Arabia. The results of the literature review have been deployed into surveys with Saudi eGovernment users. The discussion of the analysis from results obtained from the practical study has provided a framework that encompasses the eGovernment adoption success factors for Saudi Arabia.

أجهزة الكمبيوتر والمجتمع

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

معهد تكنولوجيا المعلومات ITI

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً