No Arabic abstract
In applications of climate information, coarse-resolution climate projections commonly need to be downscaled to a finer grid. One challenge of this requirement is the modeling of sub-grid variability and the spatial and temporal dependence at the finer scale. Here, a post-processing procedure is proposed for temperature projections that addresses this challenge. The procedure employs statistical bias correction and stochastic downscaling in two steps. In a first step, errors that are related to spatial and temporal features of the first two moments of the temperature distribution at model scale are identified and corrected. Secondly, residual space-time dependence at the finer scale is analyzed using a statistical model, from which realizations are generated and then combined with appropriate climate change signal to form the downscaled projection fields. Using a high-resolution observational gridded data product, the proposed approach is applied in a case study where projections of two regional climate models from the EURO-CORDEX ensemble are bias-corrected and downscaled to a 1x1 km grid in the Trondelag area of Norway. A cross-validation study shows that the proposed procedure generates results that better reflect the marginal distributional properties of the data product and have better consistency in space and time than empirical quantile mapping.
The Global Historical Climatology Network-Daily database contains, among other variables, daily maximum and minimum temperatures from weather stations around the globe. It is long known that climatological summary statistics based on daily temperature minima and maxima will not be accurate, if the bias due to the time at which the observations were collected is not accounted for. Despite some previous work, to our knowledge, there does not exist a satisfactory solution to this important problem. In this paper, we carefully detail the problem and develop a novel approach to address it. Our idea is to impute the hourly temperatures at the location of the measurements by borrowing information from the nearby stations that record hourly temperatures, which then can be used to create accurate summaries of temperature extremes. The key difficulty is that these imputations of the temperature curves must satisfy the constraint of falling between the observed daily minima and maxima, and attaining those values at least once in a twenty-four hour period. We develop a spatiotemporal Gaussian process model for imputing the hourly measurements from the nearby stations, and then develop a novel and easy to implement Markov Chain Monte Carlo technique to sample from the posterior distribution satisfying the above constraints. We validate our imputation model using hourly temperature data from four meteorological stations in Iowa, of which one is hidden and the data replaced with daily minima and maxima, and show that the imputed temperatures recover the hidden temperatures well. We also demonstrate that our model can exploit information contained in the data to infer the time of daily measurements.
An important challenge in statistical analysis lies in controlling the bias of estimators due to the ever-increasing data size and model complexity. Approximate numerical methods and data features like censoring and misclassification often result in analytical and/or computational challenges when implementing standard estimators. As a consequence, consistent estimators may be difficult to obtain, especially in complex and/or high dimensional settings. In this paper, we study the properties of a general simulation-based estimation framework that allows to construct bias corrected consistent estimators. We show that the considered approach leads, under more general conditions, to stronger bias correction properties compared to alternative methods. Besides its bias correction advantages, the considered method can be used as a simple strategy to construct consistent estimators in settings where alternative methods may be challenging to apply. Moreover, the considered framework can be easily implemented and is computationally efficient. These theoretical results are highlighted with simulation studies of various commonly used models, including the negative binomial regression (with and without censoring) and the logistic regression (with and without misclassification errors). Additional numerical illustrations are provided in the supplementary materials.
Fine particulate matter (PM2.5) is a mixture of air pollutants that has adverse effects on human health. Understanding the health effects of PM2.5 mixture and its individual species has been a research priority over the past two decades. However, the limited availability of speciated PM2.5 measurements continues to be a major challenge in exposure assessment for conducting large-scale population-based epidemiology studies. The PM2.5 species have complex spatial-temporal and cross dependence structures that should be accounted for in estimating the spatiotemporal distribution of each component. Two major sources of air quality data are commonly used for deriving exposure estimates: point-level monitoring data and gridded numerical computer model simulation, such as the Community Multiscale Air Quality (CMAQ) model. We propose a statistical method to combine these two data sources for estimating speciated PM2.5 concentration. Our method models the complex relationships between monitoring measurements and the numerical model output at different spatial resolutions, and we model the spatial dependence and cross dependence among PM2.5 species. We apply the method to combine CMAQ model output with major PM2.5 species measurements in the contiguous United States in 2011.
Complex interconnections between information technology and digital control systems have significantly increased cybersecurity vulnerabilities in smart grids. Cyberattacks involving data integrity can be very disruptive because of their potential to compromise physical control by manipulating measurement data. This is especially true in large and complex electric networks that often rely on traditional intrusion detection systems focused on monitoring network traffic. In this paper, we develop an online detection algorithm to detect and localize covert attacks on smart grids. Using a network system model, we develop a theoretical framework by characterizing a covert attack on a generator bus in the network as sparse features in the state-estimation residuals. We leverage such sparsity via a regularized linear regression method to detect and localize covert attacks based on the regression coefficients. We conduct a comprehensive numerical study on both linear and nonlinear system models to validate our proposed method. The results show that our method outperforms conventional methods in both detection delay and localization accuracy.
Existing methods for diagnosing predictability in climate indices often make a number of unjustified assumptions about the climate system that can lead to misleading conclusions. We present a flexible family of state-space models capable of separating the effects of external forcing on inter-annual time scales, from long-term trends and decadal variability, short term weather noise, observational errors and changes in autocorrelation. Standard potential predictability models only estimate the fraction of the total variance in the index attributable to external forcing. In addition, our methodology allows us to partition individual seasonal means into forced, slow, fast and error components. Changes in the predictable signal within the season can also be estimated. The model can also be used in forecast mode to assess both intra- and inter-seasonal predictability. We apply the proposed methodology to a North Atlantic Oscillation index for the years 1948-2017. Around 60% of the inter-annual variance in the December-January-February mean North Atlantic Oscillation is attributable to external forcing, and 8% to trends on longer time-scales. In some years, the external forcing remains relatively constant throughout the winter season, in others it changes during the season. Skillful statistical forecasts of the December-January-February mean North Atlantic Oscillation are possible from the end of November onward and predictability extends into March. Statistical forecasts of the December-January-February mean achieve a correlation with the observations of 0.48.