Data Mining in Large Frequency Tables With Ontology, with an Application to the Vaccine Adverse Event Reporting System

73 0 0.0 ( 0 )

Download Cite

Added by Bangyao Zhao

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Bangyao Zhao - Lili Zhao

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Vaccine safety is a concerning problem of the public, and many signal detecting methods have been developed to identify relative risks between vaccines and adverse events (AEs). Those methods usually focus on individual AEs, where the randomness of data is high. The results often turn out to be inaccurate and lack of clinical meaning. The AE ontology contains information about biological similarity of AEs. Based on this, we extend the concept of relative risks (RRs) to AE group level, which allows the possibility of more accurate and meaningful estimation by utilizing data from the whole group. In this paper, we propose the method zGPS.AO (Zero Inflated Gamma Poisson Shrinker with AE ontology) based on the zero inflated negative binomial distribution. This model has two purples: a regression model estimating group level RRs, and a empirical bayes framework to evaluate AE level RRs. The regression part can handle both excess zeros and over dispersion in the data, and the empirical method borrows information from both group level and AE level to reduce data noise and stabilize the AE level result. We have demonstrate the unbiaseness and low variance features of our model with simulated data, and obtained meaningful results coherent with previous studies on the VAERS (Vaccine Adverse Event Reporting System) database. The proposed methods are implemented in the R package zGPS.AO, which can be installed from the Comprehensive R Archive Network, CRAN. The results on VAERS data are visualized using the interactive web app Rshiny.

rate research

Gaussian Process Nowcasting: Application to COVID-19 Mortality Reporting

126 - Iwona Hawryluk , Henrique Hoeltgebaum , Swapnil Mishra 2021

Updating observations of a signal due to the delays in the measurement process is a common problem in signal processing, with prominent examples in a wide range of fields. An important example of this problem is the nowcasting of COVID-19 mortality: given a stream of reported counts of daily deaths, can we correct for the delays in reporting to paint an accurate picture of the present, with uncertainty? Without this correction, raw data will often mislead by suggesting an improving situation. We present a flexible approach using a latent Gaussian process that is capable of describing the changing auto-correlation structure present in the reporting time-delay surface. This approach also yields robust estimates of uncertainty for the estimated nowcasted numbers of deaths. We test assumptions in model specification such as the choice of kernel or hyper priors, and evaluate model performance on a challenging real dataset from Brazil. Our experiments show that Gaussian process nowcasting performs favourably against both comparable methods, and against a small sample of expert human predictions. Our approach has substantial practical utility in disease modelling -- by applying our approach to COVID-19 mortality data from Brazil, where reporting delays are large, we can make informative predictions on important epidemiological quantities such as the current effective reproduction number.

Applications Machine Learning

Multi-resolution Spatial Regression for Aggregated Data with an Application to Crop Yield Prediction

95 - Harrison Zhu , Adam Howes , Owen van Eer 2021

We develop a new methodology for spatial regression of aggregated outputs on multi-resolution covariates. Such problems often occur with spatial data, for example in crop yield prediction, where the output is spatially-aggregated over an area and the covariates may be observed at multiple resolutions. Building upon previous work on aggregated output regression, we propose a regression framework to synthesise the effects of the covariates at different resolutions on the output and provide uncertainty estimation. We show that, for a crop yield prediction problem, our approach is more scalable, via variational inference, than existing multi-resolution regression models. We also show that our framework yields good predictive performance, compared to existing multi-resolution crop yield models, whilst being able to provide estimation of the underlying spatial effects.

Applications Methodology

Bayesian Nonparametric Classification for Incomplete Data With a High Missing Rate: an Application to Semiconductor Manufacturing Data

94 - Sewon Park , Kyeongwon Lee , Da-Eun Jeong 2021

During the semiconductor manufacturing process, predicting the yield of the semiconductor is an important problem. Early detection of defective product production in the manufacturing process can save huge production cost. The data generated from the semiconductor manufacturing process have characteristics of highly non-normal distributions, complicated missing patterns and high missing rate, which complicate the prediction of the yield. We propose Dirichlet process - naive Bayes model (DPNB), a classification method based on the mixtures of Dirichlet process and naive Bayes model. Since the DPNB is based on the mixtures of Dirichlet process and learns the joint distribution of all variables involved, it can handle highly non-normal data and can make predictions for the test dataset with any missing patterns. The DPNB also performs well for high missing rates since it uses all information of observed components. Experiments on various real datasets including semiconductor manufacturing data show that the DPNB has better performance than MICE and MissForest in terms of predicting missing values as percentage of missing values increases.

Applications

Sequential Pattern mining of Longitudinal Adverse Events After Left Ventricular Assist Device Implant

134 - Faezeh Movahedi , Robert L. Kormos , Lisa Lohmueller 2019

Left ventricular assist devices (LVADs) are an increasingly common therapy for patients with advanced heart failure. However, implantation of the LVAD increases the risk of stroke, infection, bleeding, and other serious adverse events (AEs). Most post-LVAD AEs studies have focused on individual AEs in isolation, neglecting the possible interrelation, or causality between AEs. This study is the first to conduct an exploratory analysis to discover common sequential chains of AEs following LVAD implantation that are correlated with important clinical outcomes. This analysis was derived from 58,575 recorded AEs for 13,192 patients in International Registry for Mechanical Circulatory Support (INTERMACS) who received a continuousflow LVAD between 2006 and 2015. The pattern mining procedure involved three main steps: (1) creating a bank of AE sequences by converting the AEs for each patient into a single, chronologically sequenced record, (2) grouping patients with similar AE sequences using hierarchical clustering, and (3) extracting temporal chains of AEs for each group of patients using Markov modeling. The mined results indicate the existence of seven groups of sequential chains of AEs, characterized by common types of AEs that occurred in a unique order. The groups were identified as: GRP1: Recurrent bleeding, GRP2: Trajectory of device malfunction & explant, GRP3: Infection, GRP4: Trajectories to transplant, GRP5: Cardiac arrhythmia, GRP6: Trajectory of neurological dysfunction & death, and GRP7: Trajectory of respiratory failure, renal dysfunction & death. These patterns of sequential post-LVAD AEs disclose potential interdependence between AEs and may aid prediction, and prevention, of subsequent AEs in future studies.

Applications

Protocol for a Study of the Effect of Surface Mining in Central Appalachia on Adverse Birth Outcomes

131 - Dylan S. Small , Dan Firth , Luke Keele 2020

Surface mining has become a major method of coal mining in Central Appalachia alongside the traditional underground mining. Concerns have been raised about the health effects of this surface mining, particularly mountaintop removal mining where coal is mined upon steep mountaintops by removing the mountaintop through clearcutting forests and explosives. We have designed a matched observational study to assess the effects of surface mining in Central Appalachia on adverse birth outcomes. This protocol describes for the study the background and motivation, the sample selection and the analysis plan.

Applications