No Arabic abstract
Many studies have reported associations between later-life cognition and socioeconomic position in childhood, young adulthood, and mid-life. However, the vast majority of these studies are unable to quantify how these associations vary over time and with respect to several demographic factors. Varying coefficient (VC) models, which treat the covariate effects in a linear model as nonparametric functions of additional effect modifiers, offer an appealing way to overcome these limitations. Unfortunately, state-of-the-art VC modeling methods require computationally prohibitive parameter tuning or make restrictive assumptions about the functional form of the covariate effects. In response, we propose VCBART, which estimates the covariate effects in a VC model using Bayesian Additive Regression Trees. With simple default hyperparameter settings, VCBART outperforms existing methods in terms of covariate effect estimation and prediction. Using VCBART, we predict the cognitive trajectories of 4,167 subjects from the Health and Retirement Study using multiple measures of socioeconomic position and physical health. We find that socioeconomic position in childhood and young adulthood have small effects that do not vary with age. In contrast, the effects of measures of mid-life physical health tend to vary with respect to age, race, and marital status. An R package implementing VC-BART is available at https://github.com/skdeshpande91/VCBART
Bayesian Additive Regression Trees (BART) are non-parametric models that can capture complex exogenous variable effects. In any regression problem, it is often of interest to learn which variables are most active. Variable activity in BART is usually measured by counting the number of times a tree splits for each variable. Such one-way counts have the advantage of fast computations. Despite their convenience, one-way counts have several issues. They are statistically unjustified, cannot distinguish between main effects and interaction effects, and become inflated when measuring interaction effects. An alternative method well-established in the literature is Sobol indices, a variance-based global sensitivity analysis technique. However, these indices often require Monte Carlo integration, which can be computationally expensive. This paper provides analytic expressions for Sobol indices for BART posterior samples. These expressions are easy to interpret and are computationally feasible. Furthermore, we will show a fascinating connection between first-order (main-effects) Sobol indices and one-way counts. We also introduce a novel ranking method, and use this to demonstrate that the proposed indices preserve the Sobol-based rank order of variable importance. Finally, we compare these methods using analytic test functions and the En-ROADS climate impacts simulator.
We develop a Bayesian sum-of-trees model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BARTs many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.
We introduce a new method of Bayesian wavelet shrinkage for reconstructing a signal when we observe a noisy version. Rather than making the common assumption that the wavelet coefficients of the signal are independent, we allow for the possibility that they are locally correlated in both location (time) and scale (frequency). This leads us to a prior structure which is analytically intractable, but it is possible to draw independent samples from a close approximation to the posterior distribution by an approach based on Coupling From The Past.
Nonparametric varying coefficient (NVC) models are useful for modeling time-varying effects on responses that are measured repeatedly. In this paper, we introduce the nonparametric varying coefficient spike-and-slab lasso (NVC-SSL) for Bayesian estimation and variable selection in NVC models. The NVC-SSL simultaneously selects and estimates the significant varying coefficients, while also accounting for temporal correlations. Our model can be implemented using a computationally efficient expectation-maximization (EM) algorithm. We also employ a simple method to make our model robust to misspecification of the temporal correlation structure. In contrast to frequentist approaches, little is known about the large-sample properties for Bayesian NVC models when the dimension of the covariates $p$ grows much faster than sample size $n$. In this paper, we derive posterior contraction rates for the NVC-SSL model when $p gg n$ under both correct specification and misspecification of the temporal correlation structure. Thus, our results are derived under weaker assumptions than those seen in other high-dimensional NVC models which assume independent and identically distributed (iid) random errors. Finally, we illustrate our methodology through simulation studies and data analysis. Our method is implemented in the publicly available R package NVCSSL.
We develop a new Bayesian modelling framework for the class of higher-order, variable-memory Markov chains, and introduce an associated collection of methodological tools for exact inference with discrete time series. We show that a version of the context tree weighting algorithm can compute the prior predictive likelihood exactly (averaged over both models and parameters), and two related algorithms are introduced, which identify the a posteriori most likely models and compute their exact posterior probabilities. All three algorithms are deterministic and have linear-time complexity. A family of variable-dimension Markov chain Monte Carlo samplers is also provided, facilitating further exploration of the posterior. The performance of the proposed methods in model selection, Markov order estimation and prediction is illustrated through simulation experiments and real-world applications with data from finance, genetics, neuroscience, and animal communication. The associated algorithms are implemented in the R package BCT.