No Arabic abstract
We propose the spatial-temporal aggregated predictor (STAP) modeling framework to address measurement and estimation issues that arise when assessing the relationship between built environment features (BEF) and health outcomes. Many BEFs can be mapped as point locations and thus traditional exposure metrics are based on the number of features within a pre-specified spatial unit. The size of the spatial unit--or spatial scale--that is most appropriate for a particular health outcome is unknown and its choice inextricably impacts the estimated health effect. A related issue is the lack of knowledge of the temporal scale--or the length of exposure time that is necessary for the BEF to render its full effect on the health outcome. The proposed STAP model enables investigators to estimate both the spatial and temporal scales for a given BEF in a data-driven fashion, thereby providing a flexible solution for measuring the relationship between outcomes and spatial proximity to point-referenced exposures. Simulation studies verify the validity of our method for estimating the scales as well as the association between availability of BEFs and health outcomes. We apply this method to estimate the spatial-temporal association between supermarkets and BMI using data from the Multi-Ethnic Atherosclerosis Study, demonstrating the methods applicability in cohort studies.
We present an approach to estimate distance-dependent heterogeneous associations between point-referenced exposures to built environment characteristics and health outcomes. By estimating associations that depend non-linearly on distance between subjects and point-referenced exposures, this method addresses the modifiable area-unit problem that is pervasive in the built environment literature. Additionally, by estimating heterogeneous effects, the method also addresses the uncertain geographic context problem. The key innovation of our method is to combine ideas from the non-parametric function estimation literature and the Bayesian Dirichlet process literature. The former is used to estimate nonlinear associations between subjects outcomes and proximate built environment features, and the latter identifies clusters within the population that have different effects. We study this method in simulations and apply our model to study heterogeneity in the association between fast food restaurant availability and weight status of children attending schools in Los Angeles, California.
Built environment features (BEFs) refer to aspects of the human constructed environment, which may in turn support or restrict health related behaviors and thus impact health. In this paper we are interested in understanding whether the spatial distribution and quantity of fast food restaurants (FFRs) influence the risk of obesity in schoolchildren. To achieve this goal, we propose a two-stage Bayesian hierarchical modeling framework. In the first stage, examining the position of FFRs relative to that of some reference locations - in our case, schools - we model the distances of FFRs from these reference locations as realizations of Inhomogenous Poisson processes (IPP). With the goal of identifying representative spatial patterns of exposure to FFRs, we model the intensity functions of the IPPs using a Bayesian non-parametric viewpoint and specifying a Nested Dirichlet Process prior. The second stage model relates exposure patterns to obesity, offering two different approaches to accommodate uncertainty in the exposure patterns estimated in the first stage: in the first approach the odds of obesity at the school level is regressed on cluster indicators, each representing a major pattern of exposure to FFRs. In the second, we employ Bayesian Kernel Machine regression to relate the odds of obesity to the multivariate vector reporting the degree of similarity of a given school to all other schools. Our analysis on the influence of patterns of FFR occurrence on obesity among Californian schoolchildren has indicated that, in 2010, among schools that are consistently assigned to a cluster, there is a lower odds of obesity amongst 9th graders who attend schools with most distant FFR occurrences in a 1-mile radius as compared to others.
The rstap package implements Bayesian spatial temporal aggregated predictor models in R using the probabilistic programming language Stan. A variety of distributions and link functions are supported, allowing users to fit this extension to the generalized linear model with both independent and correlated outcomes.
Knockoffs provide a general framework for controlling the false discovery rate when performing variable selection. Much of the Knockoffs literature focuses on theoretical challenges and we recognize a need for bringing some of the current ideas into practice. In this paper we propose a sequential algorithm for generating knockoffs when underlying data consists of both continuous and categorical (factor) variables. Further, we present a heuristic multiple knockoffs approach that offers a practical assessment of how robust the knockoff selection process is for a given data set. We conduct extensive simulations to validate performance of the proposed methodology. Finally, we demonstrate the utility of the methods on a large clinical data pool of more than $2,000$ patients with psoriatic arthritis evaluated in 4 clinical trials with an IL-17A inhibitor, secukinumab (Cosentyx), where we determine prognostic factors of a well established clinical outcome. The analyses presented in this paper could provide a wide range of applications to commonly encountered data sets in medical practice and other fields where variable selection is of particular interest.
When fitting statistical models, some predictors are often found to be correlated with each other, and functioning together. Many group variable selection methods are developed to select the groups of predictors that are closely related to the continuous or categorical response. These existing methods usually assume the group structures are well known. For example, variables with similar practical meaning, or dummy variables created by categorical data. However, in practice, it is impractical to know the exact group structure, especially when the variable dimensional is large. As a result, the group variable selection results may be selected. To solve the challenge, we propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem. The variable clustering stage uses information from the data to find a group structure, which improves the performance of the existing group variable selection methods. For ultrahigh dimensional data, where the predictors are much larger than observations, we incorporated a variable screening method in the first stage and shows the advantages of such an approach. In this article, we compared and discussed the performance of four existing group variable selection methods under different simulation models, with and without the variable clustering stage. The two-stage method shows a better performance, in terms of the prediction accuracy, as well as in the accuracy to select active predictors. An athletes data is also used to show the advantages of the proposed method.