No Arabic abstract
The ability to identify time periods when individuals are most susceptible to exposures, as well as the biological mechanisms through which these exposures act, is of great public health interest. Growing evidence supports an association between prenatal exposure to air pollution and epigenetic marks, such as DNA methylation, but the timing and gene-specific effects of these epigenetic changes are not well understood. Here, we present the first study that aims to identify prenatal windows of susceptibility to air pollution exposures in cord blood DNA methylation. In particular, we propose a function-on-function regression model that leverages data from nearby DNA methylation probes to identify epigenetic regions that exhibit windows of susceptibility to ambient particulate matter less than 2.5 microns (PM$_{2.5}$). By incorporating the covariance structure among both the multivariate DNA methylation outcome and the time-varying exposure under study, this framework yields greater power to detect windows of susceptibility and greater control of false discoveries than methods that model probes independently. We compare our method to a distributed lag model approach that models DNA methylation in a probe-by-probe manner, both in simulation and by application to motivating data from the Project Viva birth cohort. In two epigenetic regions selected based on prior studies of air pollution effects on epigenome-wide methylation, we identify windows of susceptibility to PM$_{2.5}$ exposure near the beginning and middle of the third trimester of pregnancy.
Exposures to environmental chemicals during gestation can alter health status later in life. Most studies of maternal exposure to chemicals during pregnancy have focused on a single chemical exposure observed at high temporal resolution. Recent research has turned to focus on exposure to mixtures of multiple chemicals, generally observed at a single time point. We consider statistical methods for analyzing data on chemical mixtures that are observed at a high temporal resolution. As motivation, we analyze the association between exposure to four ambient air pollutants observed weekly throughout gestation and birth weight in a Boston-area prospective birth cohort. To explore patterns in the data, we first apply methods for analyzing data on (1) a single chemical observed at high temporal resolution, and (2) a mixture measured at a single point in time. We highlight the shortcomings of these approaches for temporally-resolved data on exposure to chemical mixtures. Second, we propose a novel method, a Bayesian kernel machine regression distributed lag model (BKMR-DLM), that simultaneously accounts for nonlinear associations and interactions among time-varying measures of exposure to mixtures. BKMR-DLM uses a functional weight for each exposure that parameterizes the window of susceptibility corresponding to that exposure within a kernel machine framework that captures non-linear and interaction effects of the multivariate exposure on the outcome. In a simulation study, we show that the proposed method can better estimate the exposure-response function and, in high signal settings, can identify critical windows in time during which exposure has an increased association with the outcome. Applying the proposed method to the Boston birth cohort data, we find evidence of a negative association between organic carbon and birth weight and that nitrate modifies the organic carbon, ...
We study additive function-on-function regression where the mean response at a particular time point depends on the time point itself as well as the entire covariate trajectory. We develop a computationally efficient estimation methodology based on a novel combination of spline bases with an eigenbasis to represent the trivariate kernel function. We discuss prediction of a new response trajectory, propose an inference procedure that accounts for total variability in the predicted response curves, and construct pointwise prediction intervals. The estimation/inferential procedure accommodates realistic scenarios such as correlated error structure as well as sparse and/or irregular designs. We investigate our methodology in finite sample size through simulations and two real data applications.
In this paper, a functional partial quantile regression approach, a quantile regression analog of the functional partial least squares regression, is proposed to estimate the function-on-function linear quantile regression model. A partial quantile covariance function is first used to extract the functional partial quantile regression basis functions. The extracted basis functions are then used to obtain the functional partial quantile regression components and estimate the final model. In our proposal, the functional forms of the discretely observed random variables are first constructed via a finite-dimensional basis function expansion method. The functional partial quantile regression constructed using the functional random variables is approximated via the partial quantile regression constructed using the basis expansion coefficients. The proposed method uses an iterative procedure to extract the partial quantile regression components. A Bayesian information criterion is used to determine the optimum number of retained components. The proposed functional partial quantile regression model allows for more than one functional predictor in the model. However, the true form of the proposed model is unspecified, as the relevant predictors for the model are unknown in practice. Thus, a forward variable selection procedure is used to determine the significant predictors for the proposed model. Moreover, a case-sampling-based bootstrap procedure is used to construct pointwise prediction intervals for the functional response. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments under different data generation processes and error distributions. Through an empirical data example, air quality data are analyzed to demonstrate the effectiveness of the proposed method.
Environmental data may be large due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance slightly better than random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.
We propose a versatile joint regression framework for count responses. The method is implemented in the R add-on package GJRM and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of the marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by a football application, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposals empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting football-specific settings. Finally, the method is applied to FIFA World Cup data, showing its competitiveness to the standard approach with regard to predictive performance.