No Arabic abstract
Estimation of the long-term health effects of air pollution is a challenging task, especially when modelling small-area disease incidence data in an ecological study design. The challenge comes from the unobserved underlying spatial correlation structure in these data, which is accounted for using random effects modelled by a globally smooth conditional autoregressive model. These smooth random effects confound the effects of air pollution, which are also globally smooth. To avoid this collinearity a Bayesian localised conditional autoregressive model is developed for the random effects. This localised model is flexible spatially, in the sense that it is not only able to model step changes in the random effects surface, but also is able to capture areas of spatial smoothness in the study region. This methodological development allows us to improve the estimation performance of the covariate effects, compared to using traditional conditional auto-regressive models. These results are established using a simulation study, and are then illustrated with our motivating study on air pollution and respiratory ill health in Greater Glasgow, Scotland in 2010. The model shows substantial health effects of particulate matter air pollution and income deprivation, whose effects have been consistently attenuated by the currently available globally smooth models.
The relationship between short-term exposure to air pollution and mortality or morbidity has been the subject of much recent research, in which the standard method of analysis uses Poisson linear or additive models. In this paper we use a Bayesian dynamic generalised linear model (DGLM) to estimate this relationship, which allows the standard linear or additive model to be extended in two ways: (i) the long-term trend and temporal correlation present in the health data can be modelled by an autoregressive process rather than a smooth function of calendar time; (ii) the effects of air pollution are allowed to evolve over time. The efficacy of these two extensions are investigated by applying a series of dynamic and non-dynamic models to air pollution and mortality data from Greater London. A Bayesian approach is taken throughout, and a Markov chain monte carlo simulation algorithm is presented for inference. An alternative likelihood based analysis is also presented, in order to allow a direct comparison with the only previous analysis of air pollution and health data using a DGLM.
Microbiome data analyses require statistical models that can simultaneously decode microbes reactions to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straightforward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between response and predictor nodes because it does not represent the adjacency matrix. This observation is especially important in biological settings when we have prior knowledge on the edges from specific experimental interventions that can only be properly encoded under a conditional dependence model. Here, we propose a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges that indeed represent conditional dependence and thus, agrees with the experimenters intuition on the average behavior of nodes under treatment. The solution to our model is sparse via Bayesian LASSO and is also guaranteed to be the sparse solution to a Conditional Auto-Regressive (CAR) model. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting, and compositional responses via appropriate hierarchical structure. We apply our model to a human gut and a soil microbial compositional datasets and we highlight that CAR-LASSO can estimate biologically meaningful network structures in the data. The CAR-LASSO software is available as an R package at https://github.com/YunyiShen/CAR-LASSO.
Count-valued time series data are routinely collected in many application areas. We are particularly motivated to study the count time series of daily new cases, arising from COVID-19 spread. We propose two Bayesian models, a time-varying semiparametric AR(p) model for count and then a time-varying INGARCH model considering the rapid changes in the spread. We calculate posterior contraction rates of the proposed Bayesian methods with respect to average Hellinger metric. Our proposed structures of the models are amenable to Hamiltonian Monte Carlo (HMC) sampling for efficient computation. We substantiate our methods by simulations that show superiority compared to some of the close existing methods. Finally we analyze the daily time series data of newly confirmed cases to study its spread through different government interventions.
Statistical techniques used in air pollution modelling usually lack the possibility to understand which predictors affect air pollution in which functional form; and are not able to regress on exceedances over certain thresholds imposed by authorities directly. The latter naturally induce conditional quantiles and reflect the seriousness of particular events. In the present paper we focus on this important aspect by developing quantile regression models further. We propose a general Bayesian effect selection approach for additive quantile regression within a highly interpretable framework. We place separate normal beta prime spike and slab priors on the scalar importance parameters of effect parts and implement a fast Gibbs sampling scheme. Specifically, it enables to study quantile-specific covariate effects, allows these covariates to be of general functional form using additive predictors, and facilitates the analysts decision whether an effect should be included linearly, non-linearly or not at all in the quantiles of interest. In a detailed analysis on air pollution data in Madrid (Spain) we find the added value of modelling extreme nitrogen dioxide (NO2) concentrations and how thresholds are driven differently by several climatological variables and traffic as a spatial proxy. Our results underpin the need of enhanced statistical models to support short-term decisions and enable local authorities to mitigate or even prevent exceedances of NO2 concentration limits.
Diffusion MRI is a neuroimaging technique measuring the anatomical structure of tissues. Using diffusion MRI to construct the connections of tissues, known as fiber tracking, is one of the most important uses of diffusion MRI. Many techniques are available recently but few properly quantify statistical uncertainties. In this paper, we propose a directed acyclic graph auto-regressive model of positive definite matrices and apply a probabilistic fiber tracking algorithm. We use both real data analysis and numerical studies to demonstrate our proposal.