ترغب بنشر مسار تعليمي؟ اضغط هنا

Calibration of Spatial Forecasts from Citizen Science Urban Air Pollution Data with Sparse Recurrent Neural Networks

86   0   0.0 ( 0 )
 نشر من قبل Matthew Bonas
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

With their continued increase in coverage and quality, data collected from personal air quality monitors has become an increasingly valuable tool to complement existing public health monitoring system over urban areas. However, the potential of using such `citizen science data for automatic early warning systems is hampered by the lack of models able to capture the high-resolution, nonlinear spatio-temporal features stemming from local emission sources such as traffic, residential heating and commercial activities. In this work, we propose a machine learning approach to forecast high-frequency spatial fields which has two distinctive advantages from standard neural network methods in time: 1) sparsity of the neural network via a spike-and-slab prior, and 2) a small parametric space. The introduction of stochastic neural networks generates additional uncertainty, and in this work we propose a fast approach for forecast calibration, both marginal and spatial. We focus on assessing exposure to urban air pollution in San Francisco, and our results suggest an improvement of 35.7% in the mean squared error over standard time series approach with a calibrated forecast for up to 5 days.



قيم البحث

اقرأ أيضاً

In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data, principal component analysis (PCA) is often incorporated to obtain low-rank (LR) structure of the data prior to spatial prediction. Recently developed predictive PCA modifies the traditional algorithm to improve the overall predictive performance by leveraging both LR and spatial structures within the data. However, predictive PCA requires complete data or an initial imputation step. Nonparametric imputation techniques without accounting for spatial information may distort the underlying structure of the data, and thus further reduce the predictive performance. We propose a convex optimization problem inspired by the LR matrix completion framework and develop a proximal algorithm to solve it. Missing data are imputed and handled concurrently within the algorithm, which eliminates the necessity of a separate imputation step. We show that our algorithm has low computational burden and leads to reliable predictive performance as the severity of missing data increases.
Air pollution constitutes the highest environmental risk factor in relation to heath. In order to provide the evidence required for health impact analyses, to inform policy and to develop potential mitigation strategies comprehensive information is r equired on the state of air pollution. Information on air pollution traditionally comes from ground monitoring (GM) networks but these may not be able to provide sufficient coverage and may need to be supplemented with information from other sources (e.g. chemical transport models; CTMs). However, these may only be available on grids and may not capture micro-scale features that may be important in assessing air quality in areas of high population. We develop a model that allows calibration between multiple data sources available at different levels of support by allowing the coefficients of calibration equations to vary over space and time, enabling downscaling where the data is sufficient to support it. The model is used to produce high-resolution (1km $times$ 1km) estimates of NO$_2$ and PM$_{2.5}$ across Western Europe for 2010-2016. Concentrations of both pollutants are decreasing during this period, however there remain large populations exposed to levels exceeding the WHO Air Quality Guidelines and thus air pollution remains a serious threat to health.
Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.
The health impact of long-term exposure to air pollution is now routinely estimated using spatial ecological studies, due to the recent widespread availability of spatial referenced pollution and disease data. However, this areal unit study design pr esents a number of statistical challenges, which if ignored have the potential to bias the estimated pollution-health relationship. One such challenge is how to control for the spatial autocorrelation present in the data after accounting for the known covariates, which is caused by unmeasured confounding. A second challenge is how to adjust the functional form of the model to account for the spatial misalignment between the pollution and disease data, which causes within-area variation in the pollution data. These challenges have largely been ignored in existing long-term spatial air pollution and health studies, so here we propose a novel Bayesian hierarchical model that addresses both challenges, and provide software to allow others to apply our model to their own data. The effectiveness of the proposed model is compared by simulation against a number of state of the art alternatives proposed in the literature, and is then used to estimate the impact of nitrogen dioxide and particulate matter concentrations on respiratory hospital admissions in a new epidemiological study in England in 2010 at the Local Authority level.
Air pollution is a major risk factor for global health, with both ambient and household air pollution contributing substantial components of the overall global disease burden. One of the key drivers of adverse health effects is fine particulate matte r ambient pollution (PM$_{2.5}$) to which an estimated 3 million deaths can be attributed annually. The primary source of information for estimating exposures has been measurements from ground monitoring networks but, although coverage is increasing, there remain regions in which monitoring is limited. Ground monitoring data therefore needs to be supplemented with information from other sources, such as satellite retrievals of aerosol optical depth and chemical transport models. A hierarchical modelling approach for integrating data from multiple sources is proposed allowing spatially-varying relationships between ground measurements and other factors that estimate air quality. Set within a Bayesian framework, the resulting Data Integration Model for Air Quality (DIMAQ) is used to estimate exposures, together with associated measures of uncertainty, on a high resolution grid covering the entire world. Bayesian analysis on this scale can be computationally challenging and here approximate Bayesian inference is performed using Integrated Nested Laplace Approximations. Model selection and assessment is performed by cross-validation with the final model offering substantial increases in predictive accuracy, particularly in regions where there is sparse ground monitoring, when compared to current approaches: root mean square error (RMSE) reduced from 17.1 to 10.7, and population weighted RMSE from 23.1 to 12.1 $mu$gm$^{-3}$. Based on summaries of the posterior distributions for each grid cell, it is estimated that 92% of the worlds population reside in areas exceeding the World Health Organizations Air Quality Guidelines.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا