No Arabic abstract
Nitrogen dioxide (NO$_2$) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO$_2$ is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regression (LUR). We develop a scalable approach for simultaneous variable selection and estimation of LUR models with spatiotemporally correlated errors, by combining a general-Vecchia Gaussian process approximation with a penalty on the LUR coefficients. In comparisons to existing methods using simulated data, our approach resulted in higher model-selection specificity and sensitivity and in better prediction in terms of calibration and sharpness, for a wide range of relevant settings. In our spatiotemporal analysis of daily, US-wide, ground-level NO$_2$ data, our approach was more accurate, and produced a sparser and more interpretable model. Our daily predictions elucidate spatiotemporal patterns of NO$_2$ concentrations across the United States, including significant variations between cities and intra-urban variation. Thus, our predictions will be useful for epidemiological and risk-assessment studies seeking daily, national-scale predictions, and they can be used in acute-outcome health-risk assessments.
A spatiotemporal calibration and resolution refinement model was fitted to calibrate nitrogen dioxide (NO$_2$) concentration estimates from the Community Multiscale Air Quality (CMAQ) model, using two sources of observed data on NO$_2$ that differed in their spatial and temporal resolutions. To refine the spatial resolution of the CMAQ model estimates, we leveraged information using additional local covariates including total traffic volume within 2 km, population density, elevation, and land use characteristics. Predictions from this model greatly improved the bias in the CMAQ estimates, as observed by the much lower mean squared error (MSE) at the NO$_2$ monitor sites. The final model was used to predict the daily concentration of ambient NO$_2$ over the entire state of Connecticut on a grid with pixels of size 300 x 300 m. A comparison of the prediction map with a similar map for the CMAQ estimates showed marked improvement in the spatial resolution. The effect of local covariates was evident in the finer spatial resolution map, where the contribution of traffic on major highways to ambient NO$_2$ concentration stands out. An animation was also provided to show the change in the concentration of ambient NO$_2$ over space and time for 1994 and 1995.
This paper proposes a new methodology to predict and update the residual useful lifetime of a system using a sequence of degradation images. The methodology integrates tensor linear algebra with traditional location-scale regression widely used in reliability and prognosis. To address the high dimensionality challenge, the degradation image streams are first projected to a low-dimensional tensor subspace that is able to preserve their information. Next, the projected image tensors are regressed against time-to-failure via penalized location-scale tensor regression. The coefficient tensor is then decomposed using CANDECOMP/PARAFAC (CP) and Tucker decompositions, which enables parameter estimation in a high-dimensional setting. Two optimization algorithms with a global convergence property are developed for model estimation. The effectiveness of our models is validated using a simulated dataset and infrared degradation image streams from a rotating machinery.
Ambient concentrations of many pollutants are associated with emissions due to human activity, such as road transport and other combustion sources. In this paper we consider air pollution as a multi--level phenomenon within a Bayesian hierarchical model. We examine different scales of variation in pollution concentrations ranging from large scale transboundary effects to more localised effects which are directly related to human activity. Specifically, in the first stage of the model, we isolate underlying patterns in pollution concentrations due to global factors such as underlying climate and topography, which are modelled together with spatial structure. At this stage measurements from monitoring sites located within rural areas are used which, as far as possible, are chosen to reflect background concentrations. Having isolated these global effects, in the second stage we assess the effects of human activity on pollution in urban areas. The proposed model was applied to concentrations of nitrogen dioxide measured throughout the EU for which significant increases are found to be associated with human activity in urban areas. The approach proposed here provides valuable information that could be used in performing health impact assessments and to inform policy.
Model fitting often aims to fit a single model, assuming that the imposed form of the model is correct. However, there may be multiple possible underlying explanatory patterns in a set of predictors that could explain a response. Model selection without regarding model uncertainty can fail to bring these patterns to light. We present multi-model penalized regression (MMPR) to acknowledge model uncertainty in the context of penalized regression. In the penalty form explored here, we examine how different settings can promote either shrinkage or sparsity of coefficients in separate models. The method is tuned to explicitly limit model similarity. A choice of penalty form that enforces variable selection is applied to predict stacking fault energy (SFE) from steel alloy composition. The aim is to identify multiple models with different subsets of covariates that explain a single type of response.
Vector-based cellular automata (CA) based on real land-parcel has become an important trend in current urban development simulation studies. Compared with raster-based and parcel-based CA models, vector CA models are difficult to be widely used because of their complex data structures and technical difficulties. The UrbanVCA, a brand-new vector CA-based urban development simulation framework was proposed in this study, which supports multiple machine-learning models. To measure the simulation accuracy better, this study also first proposes a vector-based landscape index (VecLI) model based on the real land-parcels. Using Shunde, Guangdong as the study area, the UrbanVCA simulates multiple types of urban land-use changes at the land-parcel level have achieved a high accuracy (FoM=0.243) and the landscape index similarity reaches 87.3%. The simulation results in 2030 show that the eco-protection scenario can promote urban agglomeration and reduce ecological aggression and loss of arable land by at least 60%. Besides, we have developed and released UrbanVCA software for urban planners and researchers.