No Arabic abstract
Binary regression models are commonly used in disciplines such as epidemiology and ecology to determine how spatial covariates influence individuals. In many studies, binary data are shared in a spatially aggregated form to protect privacy. For example, rather than reporting the location and result for each individual that was tested for a disease, researchers may report that a disease was detected or not detected within geopolitical units. Often, the spatial aggregation process obscures the values of response variables, spatial covariates, and locations of each individual, which makes recovering individual-level inference difficult. We show that applying a series of transformations, including a change of support, to a bivariate point process model allows researchers to recover individual-level inference for spatial covariates from spatially aggregated binary data. The series of transformations preserves the convenient interpretation of desirable binary regression models that are commonly applied to individual-level data. Using a simulation experiment, we compare the performance of our proposed method under varying types of spatial aggregation against the performance of standard approaches using the original individual-level data. We illustrate our method by modeling individual-level probability of infection using a data set that has been aggregated to protect an at-risk and endangered species of bats. Our simulation experiment and data illustration demonstrate the utility of the proposed method when access to original non-aggregated data is impractical or prohibited.
It has become increasingly common to collect high-dimensional binary data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form. Although a variety of algorithms have been proposed to approximate this intractable integral, these approaches are difficult to implement and/or inaccurate in high dimensions. We propose a two-stage Bayesian approach for inference on model parameters while taking care of the uncertainty propagation between the stages. We use the special structure of latent Gaussian models to reduce the highly expensive computation involved in joint parameter estimation to focus inference on marginal distributions of model parameters. This essentially makes the method embarrassingly parallel for both stages. We illustrate performance in simulations and applications to joint species distribution modeling in ecology.
In this article, we consider the problem of recovering the underlying trajectory when the longitudinal data are sparsely and irregularly observed and noise-contaminated. Such data are popularly analyzed with functional principal component analysis via the Principal Analysis by Conditional Estimation (PACE) method. The PACE method may sometimes be numerically unstable because it involves the inverse of the covariance matrix. We propose a sparse orthonormal approximation (SOAP) method as an alternative. It estimates the optimal empirical basis functions in the best approximation framework rather than eigen-decomposing the covariance function. The SOAP method avoids estimating the mean and covariance function, which is challenging when the assembled time points with observations for all subjects are not sufficiently dense. The SOAP method avoids the inverse of the covariance matrix, hence the computation is more stable. It does not require the functional principal component scores to follow the Gaussian distribution. We show that the SOAP estimate for the optimal empirical basis function is asymptotically consistent. The finite sample performance of the SOAP method is investigated in simulation studies in comparison with the PACE method. Our method is demonstrated by recovering the CD4 percentage curves from sparse and irregular data in the Multi-center AIDS Cohort Study.
The rstap package implements Bayesian spatial temporal aggregated predictor models in R using the probabilistic programming language Stan. A variety of distributions and link functions are supported, allowing users to fit this extension to the generalized linear model with both independent and correlated outcomes.
We propose the spatial-temporal aggregated predictor (STAP) modeling framework to address measurement and estimation issues that arise when assessing the relationship between built environment features (BEF) and health outcomes. Many BEFs can be mapped as point locations and thus traditional exposure metrics are based on the number of features within a pre-specified spatial unit. The size of the spatial unit--or spatial scale--that is most appropriate for a particular health outcome is unknown and its choice inextricably impacts the estimated health effect. A related issue is the lack of knowledge of the temporal scale--or the length of exposure time that is necessary for the BEF to render its full effect on the health outcome. The proposed STAP model enables investigators to estimate both the spatial and temporal scales for a given BEF in a data-driven fashion, thereby providing a flexible solution for measuring the relationship between outcomes and spatial proximity to point-referenced exposures. Simulation studies verify the validity of our method for estimating the scales as well as the association between availability of BEFs and health outcomes. We apply this method to estimate the spatial-temporal association between supermarkets and BMI using data from the Multi-Ethnic Atherosclerosis Study, demonstrating the methods applicability in cohort studies.
Predicting risks of chronic diseases has become increasingly important in clinical practice. When a prediction model is developed in a given source cohort, there is often a great interest to apply the model to other cohorts. However, due to potential discrepancy in baseline disease incidences between different cohorts and shifts in patient composition, the risk predicted by the original model often under- or over-estimates the risk in the new cohort. The remedy of such a poorly calibrated prediction is needed for proper medical decision-making. In this article, we assume the relative risks of predictors are the same between the two cohorts, and propose a novel weighted estimating equation approach to re-calibrating the projected risk for the targeted population through updating the baseline risk. The recalibration leverages the knowledge about the overall survival probabilities for the disease of interest and competing events, and the summary information of risk factors from the targeted population. The proposed re-calibrated risk estimators gain efficiency if the risk factor distributions are the same for both the source and target cohorts, and are robust with little bias if they differ. We establish the consistency and asymptotic normality of the proposed estimators. Extensive simulation studies demonstrate that the proposed estimators perform very well in terms of robustness and efficiency in finite samples. A real data application to colorectal cancer risk prediction also illustrates that the proposed method can be used in practice for model recalibration.