In this report we review modern nonlinearity methods that can be used in the preterm birth analysis. The nonlinear analysis of uterine contraction signals can provide information regarding physiological changes during the menstrual cycle and pregnancy. This information can be used both for the preterm birth prediction and the preterm labor control. Keywords: preterm birth, complex data analysis, nonlinear methods
Preterm infants are at high risk of developing brain injury in the first days of life as a consequence of poor cerebral oxygen delivery. Near-infrared spectroscopy (NIRS) is an established technology developed to monitor regional tissue oxygenation. Detailed waveform analysis of the cerebral NIRS signal could improve the clinical utility of this method in accurately predicting brain injury. Frequent transient cerebral oxygen desaturations are commonly observed in extremely preterm infants, yet their clinical significance remains unclear. The aim of this study was to examine and compare the performance of two distinct approaches in isolating and extracting transient deflections within NIRS signals. We optimized three different simultaneous low-pass filtering and total variation denoising (LPF_TVD) methods and compared their performance with a recently proposed method that uses singular-spectrum analysis and the discrete cosine transform (SSA_DCT). Parameters for the LPF_TVD methods were optimized over a grid search using synthetic NIRS-like signals. The SSA_DCT method was modified with a post-processing procedure to increase sparsity in the extracted components. Our analysis, using a synthetic NIRS-like dataset, showed that a LPF_TVD method outperformed the modified SSA_DCT method: median mean-squared error of 0.97 (95% CI: 0.86 to 1.07) was lower for the LPF_TVD method compared to the modified SSA_DCT method of 1.48 (95% CI: 1.33 to 1.63), P<0.001. The dual low-pass filter and total variation denoising methods are considerably more computational efficient, by 3 to 4 orders of magnitude, than the SSA_DCT method. More research is needed to examine the efficacy of these methods in extracting oxygen desaturation in real NIRS signals.
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear Aggregation (CLA), Weighted Mean Aggregation (WMA), Feature Occurrence Frequency (OFA), and Classification Accuracy Based Aggregation (CAA). To further address missing data present in each feature, we propose two novel methods: 1) Missing Data Rate and Accuracy Based Aggregation (MAA), and 2) Entropy and Accuracy Based Aggregation (EAA). Both proposed models balance the degree of data variance introduced by the missing data handling during the feature selection process while maintaining model performance. Our results show a 42% improvement in sensitivity versus fallout over previous state-of-the-art methods.
The use of convolutional neural networks (CNNs) for classification tasks has become dominant in various medical imaging applications. At the same time, recent advances in interpretable machine learning techniques have shown great potential in explaining classifiers decisions. Layer-wise relevance propagation (LRP) has been introduced as one of these novel methods that aim to provide visual interpretation for the networks decisions. In this work we propose the application of 3D CNNs with LRP for the first time for neonatal T2-weighted magnetic resonance imaging (MRI) data analysis. Through LRP, the decisions of our trained classifier are transformed into heatmaps indicating each voxels relevance for the outcome of the decision. Our resulting LRP heatmaps reveal anatomically plausible features in distinguishing preterm neonates from term ones.
In this paper, we describe a Graphical User Interface (GUI) designed to manage large quantities of image data of a biological system. After setting the design requirements for the system, we developed an ecology quantification GUI that assists biologists in analysing data. We focus on the main features of the interface and we present the results and an evaluation of the system. Finally, we provide some directions for some future work.
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: NCI60, CTRP, GDSC, CCLE and gCSI. Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies, and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.