No Arabic abstract
Mexico City tracks ground-level ozone levels to assess compliance with national ambient air quality standards and to prevent environmental health emergencies. Ozone levels show distinct daily patterns, within the city, and over the course of the year. To model these data, we use covariance models over space, circular time, and linear time. We review existing models and develop new classes of nonseparable covariance models of this type, models appropriate for quasi-periodic data collected at many locations. With these covariance models, we use nearest-neighbor Gaussian processes to predict hourly ozone levels at unobserved locations in April and May, the peak ozone season, to infer compliance to Mexican air quality standards and to estimate respiratory health risk associated with ozone. Predicted compliance with air quality standards and estimated respiratory health risk vary greatly over space and time. In some regions, we predict exceedance of national standards for more than a third of the hours in April and May. On many days, we predict that nearly all of Mexico City exceeds nationally legislated ozone thresholds at least once. In peak regions, we estimate respiratory risk for ozone to be 55% higher on average than the annual average risk and as much at 170% higher on some days.
Two algorithms are proposed to simulate space-time Gaussian random fields with a covariance function belonging to an extended Gneiting class, the definition of which depends on a completely monotone function associated with the spatial structure and a conditionally negative definite function associated with the temporal structure. In both cases, the simulated random field is constructed as a weighted sum of cosine waves, with a Gaussian spatial frequency vector and a uniform phase. The difference lies in the way to handle the temporal component. The first algorithm relies on a spectral decomposition in order to simulate a temporal frequency conditional upon the spatial one, while in the second algorithm the temporal frequency is replaced by an intrinsic random field whose variogram is proportional to the conditionally negative definite function associated with the temporal structure. Both algorithms are scalable as their computational cost is proportional to the number of space-time locations, which may be unevenly spaced in space and/or in time. They are illustrated and validated through synthetic examples.
The Global Historical Climatology Network-Daily database contains, among other variables, daily maximum and minimum temperatures from weather stations around the globe. It is long known that climatological summary statistics based on daily temperature minima and maxima will not be accurate, if the bias due to the time at which the observations were collected is not accounted for. Despite some previous work, to our knowledge, there does not exist a satisfactory solution to this important problem. In this paper, we carefully detail the problem and develop a novel approach to address it. Our idea is to impute the hourly temperatures at the location of the measurements by borrowing information from the nearby stations that record hourly temperatures, which then can be used to create accurate summaries of temperature extremes. The key difficulty is that these imputations of the temperature curves must satisfy the constraint of falling between the observed daily minima and maxima, and attaining those values at least once in a twenty-four hour period. We develop a spatiotemporal Gaussian process model for imputing the hourly measurements from the nearby stations, and then develop a novel and easy to implement Markov Chain Monte Carlo technique to sample from the posterior distribution satisfying the above constraints. We validate our imputation model using hourly temperature data from four meteorological stations in Iowa, of which one is hidden and the data replaced with daily minima and maxima, and show that the imputed temperatures recover the hidden temperatures well. We also demonstrate that our model can exploit information contained in the data to infer the time of daily measurements.
One of the most significant barriers to medication treatment is patients non-adherence to a prescribed medication regimen. The extent of the impact of poor adherence on resulting health measures is often unknown, and typical analyses ignore the time-varying nature of adherence. This paper develops a modeling framework for longitudinally recorded health measures modeled as a function of time-varying medication adherence or other time-varying covariates. Our framework, which relies on normal Bayesian dynamic linear models (DLMs), accounts for time-varying covariates such as adherence and non-dynamic covariates such as baseline health characteristics. Given the inefficiencies using standard inferential procedures for DLMs associated with infrequent and irregularly recorded response data, we develop an approach that relies on factoring the posterior density into a product of two terms; a marginal posterior density for the non-dynamic parameters, and a multivariate normal posterior density of the dynamic parameters conditional on the non-dynamic ones. This factorization leads to a two-stage process for inference in which the non-dynamic parameters can be inferred separately from the time-varying parameters. We demonstrate the application of this model to the time-varying effect of anti-hypertensive medication on blood pressure levels from a cohort of patients diagnosed with hypertension. Our model results are compared to ones in which adherence is incorporated through non-dynamic summaries.
This paper presents a dynamic linear model for modeling hourly ozone concentrations over the eastern United States. That model, which is developed within an Bayesian hierarchical framework, inherits the important feature of such models that its coefficients, treated as states of the process, can change with time. Thus the model includes a time--varying site invariant mean field as well as time varying coefficients for 24 and 12 diurnal cycle components. This cost of this models great flexibility comes at the cost of computational complexity, forcing us to use an MCMC approach and to restrict application of our model domain to a small number of monitoring sites. We critically assess this model and discover some of its weaknesses in this type of application.
In nonseparable triangular models with a binary endogenous treatment and a binary instrumental variable, Vuong and Xu (2017) show that the individual treatment effects (ITEs) are identifiable. Feng, Vuong and Xu (2019) show that a kernel density estimator that uses nonparametrically estimated ITEs as observations is uniformly consistent for the density of the ITE. In this paper, we establish the asymptotic normality of the density estimator of Feng, Vuong and Xu (2019) and show that despite their faster rate of convergence, ITEs estimation errors have a non-negligible effect on the asymptotic distribution of the density estimator. We propose asymptotically valid standard errors for the density of the ITE that account for estimated ITEs as well as bias correction. Furthermore, we develop uniform confidence bands for the density of the ITE using nonparametric or jackknife multiplier bootstrap critical values. Our uniform confidence bands have correct coverage probabilities asymptotically with polynomial error rates and can be used for inference on the shape of the ITEs distribution.