Estimating heterogeneous effects of continuous exposures using Bayesian tree ensembles: revisiting the impact of abortion rates on crime

109 0 0.0 ( 0 )

Download Cite

Added by Spencer Woody

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Spencer Woody - Carlos M. Carvalho - P. Richard Hahn

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In estimating the causal effect of a continuous exposure or treatment, it is important to control for all confounding factors. However, most existing methods require parametric specification for how control variables influence the outcome or generalized propensity score, and inference on treatment effects is usually sensitive to this choice. Additionally, it is often the goal to estimate how the treatment effect varies across observed units. To address this gap, we propose a semiparametric model using Bayesian tree ensembles for estimating the causal effect of a continuous treatment of exposure which (i) does not require a priori parametric specification of the influence of control variables, and (ii) allows for identification of effect modification by pre-specified moderators. The main parametric assumption we make is that the effect of the exposure on the outcome is linear, with the steepness of this relationship determined by a nonparametric function of the moderators, and we provide heuristics to diagnose the validity of this assumption. We apply our methods to revisit a 2001 study of how abortion rates affect incidence of crime.

rate research

Heterogeneous large datasets integration using Bayesian factor regression

116 - Alejandra Avalos-Pacheco , David Rossell , Richard S. Savage 2018

Two key challenges in modern statistical applications are the large amount of information recorded per individual, and that such data are often not collected all at once but in batches. These batch effects can be complex, causing distortions in both mean and variance. We propose a novel sparse latent factor regression model to integrate such heterogeneous data. The model provides a tool for data exploration via dimensionality reduction while correcting for a range of batch effects. We study the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our model is fitted in a deterministic fashion by means of an EM algorithm for which we derive closed-form updates, contributing a novel scalable algorithm for non-local priors of interest beyond the immediate scope of this paper. We present several examples, with a focus on bioinformatics applications. Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality, as well as the need to account for batch effects to obtain reliable results. Our model provides a novel approach to latent factor regression that balances sparsity with sensitivity and is highly computationally efficient.

Applications Methodology

Modelling the effects of air pollution on health using Bayesian Dynamic Generalised Linear Models

285 - Duncan Lee , Gavin Shaddick 2012

The relationship between short-term exposure to air pollution and mortality or morbidity has been the subject of much recent research, in which the standard method of analysis uses Poisson linear or additive models. In this paper we use a Bayesian dynamic generalised linear model (DGLM) to estimate this relationship, which allows the standard linear or additive model to be extended in two ways: (i) the long-term trend and temporal correlation present in the health data can be modelled by an autoregressive process rather than a smooth function of calendar time; (ii) the effects of air pollution are allowed to evolve over time. The efficacy of these two extensions are investigated by applying a series of dynamic and non-dynamic models to air pollution and mortality data from Greater London. A Bayesian approach is taken throughout, and a Markov chain monte carlo simulation algorithm is presented for inference. An alternative likelihood based analysis is also presented, in order to allow a direct comparison with the only previous analysis of air pollution and health data using a DGLM.

Applications Methodology

Dynamic Risk Prediction Using Survival Tree Ensembles with Application to Cystic Fibrosis

134 - Yifei Sun , Sy Han Chiou , Colin O. Wu 2020

With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to the conventional landmark prediction, our framework enjoys great flexibility in that the landmark times can be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue in model incompatibility at different landmark times. When both the longitudinal predictors and the outcome event time are subject to right censoring, existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we consider a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.

Applications

Estimating seal pup production in the Greenland Sea using Bayesian hierarchical modeling

140 - Martin Jullum , Thordis Thorarinsdottir , Fabian E. Bachl 2018

The Greenland Sea is an important breeding ground for harp and hooded seals. Estimates of the annual seal pup production are critical factors in the abundance estimation needed for management of the species. These estimates are usually based on counts from aerial photographic surveys. However, only a minor part of the whelping region can be photographed, due to its large extent. To estimate the total seal pup production, we propose a Bayesian hierarchical modeling approach motivated by viewing the seal pup appearances as a realization of a log-Gaussian Cox process using covariate information from satellite imagery as a proxy for ice thickness. For inference, we utilize the stochastic partial differential equation (SPDE) module of the integrated nested Laplace approximation (INLA) framework. In a case study using survey data from 2012, we compare our results with existing methodology in a comprehensive cross-validation study. The results of the study indicate that our method improves local estimation performance, and that the increased prediction uncertainty of our method is required to obtain calibrated count predictions. This suggests that the sampling density of the survey design may not be sufficient to obtain reliable estimates of the seal pup production.

Applications

A Bayesian Multilevel Random-Effects Model for Estimating Noise in Image Sensors

113 - Gabriel Riutort-Mayol , Virgilio Gomez-Rubio , Angeln Marques-Mateu 2020

Sensor noise sources cause differences in the signal recorded across pixels in a single image and across multiple images. This paper presents a Bayesian approach to decomposing and characterizing the sensor noise sources involved in imaging with digital cameras. A Bayesian probabilistic model based on the (theoretical) model for noise sources in image sensing is fitted to a set of a time-series of images with different reflectance and wavelengths under controlled lighting conditions. The image sensing model is a complex model, with several interacting components dependent on reflectance and wavelength. The properties of the Bayesian approach of defining conditional dependencies among parameters in a fully probabilistic model, propagating all sources of uncertainty in inference, makes the Bayesian modeling framework more attractive and powerful than classical methods for approaching the image sensing model. A feasible correspondence of noise parameters to their expected theoretical behaviors and well calibrated posterior predictive distributions with a small root mean square error for model predictions have been achieved in this study, thus showing that the proposed model accurately approximates the image sensing model. The Bayesian approach could be extended to formulate further components aimed at identifying even more specific parameters of the imaging process.

Applications Image and Video Processing