No Arabic abstract
We present an approach to estimate distance-dependent heterogeneous associations between point-referenced exposures to built environment characteristics and health outcomes. By estimating associations that depend non-linearly on distance between subjects and point-referenced exposures, this method addresses the modifiable area-unit problem that is pervasive in the built environment literature. Additionally, by estimating heterogeneous effects, the method also addresses the uncertain geographic context problem. The key innovation of our method is to combine ideas from the non-parametric function estimation literature and the Bayesian Dirichlet process literature. The former is used to estimate nonlinear associations between subjects outcomes and proximate built environment features, and the latter identifies clusters within the population that have different effects. We study this method in simulations and apply our model to study heterogeneity in the association between fast food restaurant availability and weight status of children attending schools in Los Angeles, California.
We propose the spatial-temporal aggregated predictor (STAP) modeling framework to address measurement and estimation issues that arise when assessing the relationship between built environment features (BEF) and health outcomes. Many BEFs can be mapped as point locations and thus traditional exposure metrics are based on the number of features within a pre-specified spatial unit. The size of the spatial unit--or spatial scale--that is most appropriate for a particular health outcome is unknown and its choice inextricably impacts the estimated health effect. A related issue is the lack of knowledge of the temporal scale--or the length of exposure time that is necessary for the BEF to render its full effect on the health outcome. The proposed STAP model enables investigators to estimate both the spatial and temporal scales for a given BEF in a data-driven fashion, thereby providing a flexible solution for measuring the relationship between outcomes and spatial proximity to point-referenced exposures. Simulation studies verify the validity of our method for estimating the scales as well as the association between availability of BEFs and health outcomes. We apply this method to estimate the spatial-temporal association between supermarkets and BMI using data from the Multi-Ethnic Atherosclerosis Study, demonstrating the methods applicability in cohort studies.
Faltering growth among children is a nutritional problem prevalent in low to medium income countries; it is generally defined as a slower rate of growth compared to a reference healthy population of the same age and gender. As faltering is closely associated with reduced physical, intellectual and economic productivity potential, it is important to identify faltered children and be able to characterise different growth patterns so that targeted treatments can be designed and administered. We introduce a multiclass classification model for growth trajectory that flexibly extends a current classification approach called the broken stick model, which is a piecewise linear model with breaks at fixed knot locations. Heterogeneity in growth patterns among children is captured using mixture distributed random effects, whereby the mixture components determine the classification of children into subgroups. The mixture distribution is modelled using a Dirichlet process prior, which avoids the need to choose the true number of mixture components, and allows this to be driven by the complexity of the data. Because children have individual differences in the onset of growth stages, we introduce child-specific random change points. Simulation results show that the random change point model outperforms the broken stick model because it has fewer restrictions on knot locations. We illustrate our model on a longitudinal birth cohort from the Healthy Birth, Growth and Development knowledge integration project funded by the Bill and Melinda Gates Foundation. Analysis reveals 9 subgroups of children within the population which exhibit varying faltering trends between birth and age one.
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.
We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose and compare three multiple imputation strategies (separate imputation, joint imputation with fixed effect, joint imputation without source information), as well as a technique that uses estimators that can handle missing values directly without imputing them. These methods are assessed in an extensive simulation study, showing the empirical superiority of fixed effect multiple imputation followed with any complete data generalizing estimators. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and a RCT studying the effect of tranexamic acid administration on mortality. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.