ﻻ يوجد ملخص باللغة العربية
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear Aggregation (CLA), Weighted Mean Aggregation (WMA), Feature Occurrence Frequency (OFA), and Classification Accuracy Based Aggregation (CAA). To further address missing data present in each feature, we propose two novel methods: 1) Missing Data Rate and Accuracy Based Aggregation (MAA), and 2) Entropy and Accuracy Based Aggregation (EAA). Both proposed models balance the degree of data variance introduced by the missing data handling during the feature selection process while maintaining model performance. Our results show a 42% improvement in sensitivity versus fallout over previous state-of-the-art methods.
Probabilistic weather forecasts from ensemble systems require statistical postprocessing to yield calibrated and sharp predictive distributions. This paper presents an area-covering postprocessing method for ensemble precipitation predictions. We rel
In this report we review modern nonlinearity methods that can be used in the preterm birth analysis. The nonlinear analysis of uterine contraction signals can provide information regarding physiological changes during the menstrual cycle and pregnanc
We investigate whether state-of-the-art classification features commonly used to distinguish electrons from jet backgrounds in collider experiments are overlooking valuable information. A deep convolutional neural network analysis of electromagnetic
While difference-in-differences (DID) was originally developed with one pre- and one post-treatment periods, data from additional pre-treatment periods is often available. How can researchers improve the DID design with such multiple pre-treatment pe
Saudi Arabia is predetermined to implement eGovernment and provide world-class government services to citizens by 2010. However, this initiative will be meaningless if the people did not adopt these electronic services. Therefore, the purpose of this