No Arabic abstract
The Global Historical Climatology Network-Daily database contains, among other variables, daily maximum and minimum temperatures from weather stations around the globe. It is long known that climatological summary statistics based on daily temperature minima and maxima will not be accurate, if the bias due to the time at which the observations were collected is not accounted for. Despite some previous work, to our knowledge, there does not exist a satisfactory solution to this important problem. In this paper, we carefully detail the problem and develop a novel approach to address it. Our idea is to impute the hourly temperatures at the location of the measurements by borrowing information from the nearby stations that record hourly temperatures, which then can be used to create accurate summaries of temperature extremes. The key difficulty is that these imputations of the temperature curves must satisfy the constraint of falling between the observed daily minima and maxima, and attaining those values at least once in a twenty-four hour period. We develop a spatiotemporal Gaussian process model for imputing the hourly measurements from the nearby stations, and then develop a novel and easy to implement Markov Chain Monte Carlo technique to sample from the posterior distribution satisfying the above constraints. We validate our imputation model using hourly temperature data from four meteorological stations in Iowa, of which one is hidden and the data replaced with daily minima and maxima, and show that the imputed temperatures recover the hidden temperatures well. We also demonstrate that our model can exploit information contained in the data to infer the time of daily measurements.
Mexico City tracks ground-level ozone levels to assess compliance with national ambient air quality standards and to prevent environmental health emergencies. Ozone levels show distinct daily patterns, within the city, and over the course of the year. To model these data, we use covariance models over space, circular time, and linear time. We review existing models and develop new classes of nonseparable covariance models of this type, models appropriate for quasi-periodic data collected at many locations. With these covariance models, we use nearest-neighbor Gaussian processes to predict hourly ozone levels at unobserved locations in April and May, the peak ozone season, to infer compliance to Mexican air quality standards and to estimate respiratory health risk associated with ozone. Predicted compliance with air quality standards and estimated respiratory health risk vary greatly over space and time. In some regions, we predict exceedance of national standards for more than a third of the hours in April and May. On many days, we predict that nearly all of Mexico City exceeds nationally legislated ozone thresholds at least once. In peak regions, we estimate respiratory risk for ozone to be 55% higher on average than the annual average risk and as much at 170% higher on some days.
This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.
In applications of climate information, coarse-resolution climate projections commonly need to be downscaled to a finer grid. One challenge of this requirement is the modeling of sub-grid variability and the spatial and temporal dependence at the finer scale. Here, a post-processing procedure is proposed for temperature projections that addresses this challenge. The procedure employs statistical bias correction and stochastic downscaling in two steps. In a first step, errors that are related to spatial and temporal features of the first two moments of the temperature distribution at model scale are identified and corrected. Secondly, residual space-time dependence at the finer scale is analyzed using a statistical model, from which realizations are generated and then combined with appropriate climate change signal to form the downscaled projection fields. Using a high-resolution observational gridded data product, the proposed approach is applied in a case study where projections of two regional climate models from the EURO-CORDEX ensemble are bias-corrected and downscaled to a 1x1 km grid in the Trondelag area of Norway. A cross-validation study shows that the proposed procedure generates results that better reflect the marginal distributional properties of the data product and have better consistency in space and time than empirical quantile mapping.
Facing increasing societal and economic pressure, many countries have established strategies to develop renewable energy portfolios, whose penetration in the market can alleviate the dependence on fossil fuels. In the case of wind, there is a fundamental question related to the resilience, and hence profitability of future wind farms to a changing climate, given that current wind turbines have lifespans of up to thirty years. In this work, we develop a new non-Gaussian method data to simulations and to estimate future wind, predicated on a trans-Gaussian transformation and a cluster-wise minimization of the Kullback-Leibler divergence. Future winds abundance will be determined for Saudi Arabia, a country with a recently established plan to develop a portfolio of up to 16 GW of wind energy. Further, we estimate the change in profits over future decades using additional high-resolution simulations, an improved method for vertical wind extrapolation, and power curves from a collection of popular wind turbines. We find an overall increase in the daily profit of $272,000 for the wind energy market for the optimal locations for wind farming in the country.
Evolutionary models of languages are usually considered to take the form of trees. With the development of so-called tree constraints the plausibility of the tree model assumptions can be addressed by checking whether the moments of observed variables lie within regions consistent with trees. In our linguistic application, the data set comprises acoustic samples (audio recordings) from speakers of five Romance languages or dialects. We wish to assess these functional data for compatibility with a hereditary tree model at the language level. A novel combination of canonical function analysis (CFA) with a separable covariance structure provides a method for generating a representative basis for the data. This resulting basis is formed of components which emphasize language differences whilst maintaining the integrity of the observational language-groupings. A previously unexploited Gaussian tree constraint is then applied to component-by-component projections of the data to investigate adherence to an evolutionary tree. The results indicate that while a tree model is unlikely to be suitable for modeling all aspects of the acoustic linguistic data, certain features of the spoken Romance languages highlighted by the separable-CFA basis may indeed be suitably modeled as a tree.