No Arabic abstract
Environmental processes resolved at a sufficiently small scale in space and time will inevitably display non-stationary behavior. Such processes are both challenging to model and computationally expensive when the data size is large. Instead of modeling the global non-stationarity explicitly, local models can be applied to disjoint regions of the domain. The choice of the size of these regions is dictated by a bias-variance trade-off; large regions will have smaller variance and larger bias, whereas small regions will have higher variance and smaller bias. From both the modeling and computational point of view, small regions are preferable to better accommodate the non-stationarity. However, in practice, large regions are necessary to control the variance. We propose a novel Bayesian three-step approach that allows for smaller regions without compromising the increase of the variance that would follow. We are able to propagate the uncertainty from one step to the next without issues caused by reusing the data. The improvement in inference also results in improved prediction, as our simulated example shows. We illustrate this new approach on a data set of simulated high-resolution wind speed data over Saudi Arabia.
The aim of this paper is to develop a class of spatial transformation models (STM) to spatially model the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) and a set of covariates. Our STMs include a varying Box-Cox transformation model for dealing with the issue of non-Gaussian distributed imaging data and a Gaussian Markov Random Field model for incorporating spatial smoothness of the imaging data. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulations and real data analysis demonstrate that the STM significantly outperforms the voxel-wise linear model with Gaussian noise in recovering meaningful geometric patterns. Our STM is able to reveal important brain regions with morphological changes in children with attention deficit hyperactivity disorder.
Environmental data may be large due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance slightly better than random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.
Several methods have been proposed in the spatial statistics literature for the analysis of big data sets in continuous domains. However, new methods for analyzing high-dimensional areal data are still scarce. Here, we propose a scalable Bayesian modeling approach for smoothing mortality (or incidence) risks in high-dimensional data, that is, when the number of small areas is very large. The method is implemented in the R add-on package bigDM. Model fitting and inference is based on the idea of divide and conquer and use integrated nested Laplace approximations and numerical integration. We analyze the proposals empirical performance in a comprehensive simulation study that consider two model-free settings. Finally, the methodology is applied to analyze male colorectal cancer mortality in Spanish municipalities showing its benefits with regard to the standard approach in terms of goodness of fit and computational time.
Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly important role in the diagnosis of prostate cancer. Various computer-aided detection algorithms have been proposed for automated prostate cancer detection by combining information from various mpMRI data components. However, there exist other features of mpMRI, including the spatial correlation between voxels and between-patient heterogeneity in the mpMRI parameters, that have not been fully explored in the literature but could potentially improve cancer detection if leveraged appropriately. This paper proposes novel voxel-wise Bayesian classifiers for prostate cancer that account for the spatial correlation and between-patient heterogeneity in mpMRI. Modeling the spatial correlation is challenging due to the extreme high dimensionality of the data, and we consider three computationally efficient approaches using Nearest Neighbor Gaussian Process (NNGP), knot-based reduced-rank approximation, and a conditional autoregressive (CAR) model, respectively. The between-patient heterogeneity is accounted for by adding a subject-specific random intercept on the mpMRI parameter model. Simulation results show that properly modeling the spatial correlation and between-patient heterogeneity improves classification accuracy. Application to in vivo data illustrates that classification is improved by spatial modeling using NNGP and reduced-rank approximation but not the CAR model, while modeling the between-patient heterogeneity does not further improve our classifier. Among our proposed models, the NNGP-based model is recommended considering its robust classification accuracy and high computational efficiency.
Generalized autoregressive moving average (GARMA) models are a class of models that was developed for extending the univariate Gaussian ARMA time series model to a flexible observation-driven model for non-Gaussian time series data. This work presents Bayesian approach for GARMA models with Poisson, binomial and negative binomial distributions. A simulation study was carried out to investigate the performance of Bayesian estimation and Bayesian model selection criteria. Also three real datasets were analysed using the Bayesian approach on GARMA models.