ترغب بنشر مسار تعليمي؟ اضغط هنا

Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth

347   0   0.0 ( 0 )
 نشر من قبل Chieh Wu T
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear Aggregation (CLA), Weighted Mean Aggregation (WMA), Feature Occurrence Frequency (OFA), and Classification Accuracy Based Aggregation (CAA). To further address missing data present in each feature, we propose two novel methods: 1) Missing Data Rate and Accuracy Based Aggregation (MAA), and 2) Entropy and Accuracy Based Aggregation (EAA). Both proposed models balance the degree of data variance introduced by the missing data handling during the feature selection process while maintaining model performance. Our results show a 42% improvement in sensitivity versus fallout over previous state-of-the-art methods.



قيم البحث

اقرأ أيضاً

Probabilistic weather forecasts from ensemble systems require statistical postprocessing to yield calibrated and sharp predictive distributions. This paper presents an area-covering postprocessing method for ensemble precipitation predictions. We rel y on the ensemble model output statistics (EMOS) approach, which generates probabilistic forecasts with a parametric distribution whose parameters depend on (statistics of) the ensemble prediction. A case study with daily precipitation predictions across Switzerland highlights that postprocessing at observation locations indeed improves high-resolution ensemble forecasts, with 4.5% CRPS reduction on average in the case of a lead time of 1 day. Our main aim is to achieve such an improvement without binding the model to stations, by leveraging topographical covariates. Specifically, regression coefficients are estimated by weighting the training data in relation to the topographical similarity between their station of origin and the prediction location. In our case study, this approach is found to reproduce the performance of the local model without using local historical data for calibration. We further identify that one key difficulty is that postprocessing often degrades the performance of the ensemble forecast during summer and early autumn. To mitigate, we additionally estimate on the training set whether postprocessing at a specific location is expected to improve the prediction. If not, the direct model output is used. This extension reduces the CRPS of the topographical model by up to another 1.7% on average at the price of a slight degradation in calibration. In this case, the highest improvement is achieved for a lead time of 4 days.
In this report we review modern nonlinearity methods that can be used in the preterm birth analysis. The nonlinear analysis of uterine contraction signals can provide information regarding physiological changes during the menstrual cycle and pregnanc y. This information can be used both for the preterm birth prediction and the preterm labor control. Keywords: preterm birth, complex data analysis, nonlinear methods
We investigate whether state-of-the-art classification features commonly used to distinguish electrons from jet backgrounds in collider experiments are overlooking valuable information. A deep convolutional neural network analysis of electromagnetic and hadronic calorimeter deposits is compared to the performance of typical features, revealing a $approx 5%$ gap which indicates that these lower-level data do contain untapped classification power. To reveal the nature of this unused information, we use a recently developed technique to map the deep network into a space of physically interpretable observables. We identify two simple calorimeter observables which are not typically used for electron identification, but which mimic the decisions of the convolutional network and nearly close the performance gap.
While difference-in-differences (DID) was originally developed with one pre- and one post-treatment periods, data from additional pre-treatment periods is often available. How can researchers improve the DID design with such multiple pre-treatment pe riods under what conditions? We first use potential outcomes to clarify three benefits of multiple pre-treatment periods: (1) assessing the parallel trends assumption, (2) improving estimation accuracy, and (3) allowing for a more flexible parallel trends assumption. We then propose a new estimator, double DID, which combines all the benefits through the generalized method of moments and contains the two-way fixed effects regression as a special case. In a wide range of applications where several pre-treatment periods are available, the double DID improves upon the standard DID both in terms of identification and estimation accuracy. We also generalize the double DID to the staggered adoption design where different units can receive the treatment in different time periods. We illustrate the proposed method with two empirical applications, covering both the basic DID and staggered adoption designs. We offer an open-source R package that implements the proposed methodologies.
Saudi Arabia is predetermined to implement eGovernment and provide world-class government services to citizens by 2010. However, this initiative will be meaningless if the people did not adopt these electronic services. Therefore, the purpose of this study is to determine success factors that will facilitate the adoption of eGovernment in Saudi Arabia. The results of the literature review have been deployed into surveys with Saudi eGovernment users. The discussion of the analysis from results obtained from the practical study has provided a framework that encompasses the eGovernment adoption success factors for Saudi Arabia.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا