No Arabic abstract
Spatial statistics is an area of study devoted to the statistical analysis of data that have a spatial label associated with them. Geographers often refer to the location information associated with the attribute information, whose study defines a research area called spatial analysis. Many of the ways to manipulate spatial data are driven by algorithms with no uncertainty quantification associated with them. When a spatial analysis is statistical, that is, it incorporates uncertainty quantification, it falls in the research area called spatial statistics. The primary feature of spatial statistical models is that nearby attribute values are more statistically dependent than distant attribute values; this is a paraphrasing of what is sometimes called the First Law of Geography (Tobler, 1970).
We consider the problem of non-parametric testing of independence of two components of a stationary bivariate spatial process. In particular, we revisit the random shift approach that has become a standard method for testing the independent superposition hypothesis in spatial statistics, and it is widely used in a plethora of practical applications. However, this method has a problem of liberality caused by breaking the marginal spatial correlation structure due to the toroidal correction. This indeed causes that the assumption of exchangability, which is essential for the Monte Carlo test to be exact, is not fulfilled. We present a number of permutation strategies and show that the random shift with the variance correction brings a suitable improvement compared to the torus correction in the random field case. It reduces the liberality and achieves the largest power from all investigated variants. To obtain the variance for the variance correction method, several approaches were studied. The best results were achieved, for the sample covariance as the test statistics, with the correction factor $1/n$. This corresponds to the asymptotic order of the variance of the test statistics. In the point process case, the problem of deviations from exchangeability is far more complex and we propose an alternative strategy based on the mean cross nearest-neighbor distance and torus correction. It reduces the liberality but achieves slightly lower power than the usual cross $K$-function. Therefore we recommend it, when the point patterns are clustered, where the cross $K$-function achieves liberality.
Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Morans $I$ and Gearys $C$ are widely used to measure spatial autocorrelation, they are slow: all popular methods run in $Omega(n^2)$ time, rendering them unusable for large data sets, or long time-courses with moderate numbers of points. We propose a new $S_A$ statistic based on the notion that the variance observed when merging pairs of nearby clusters should increase slowly for spatially autocorrelated variables. We give a linear-time algorithm to calculate $S_A$ for a variable with an input agglomeration order (available at https://github.com/aamgalan/spatial_autocorrelation). For a typical dataset of $n approx 63,000$ points, our $S_A$ autocorrelation measure can be computed in 1 second, versus 2 hours or more for Morans $I$ and Gearys $C$. Through simulation studies, we demonstrate that $S_A$ identifies spatial correlations in variables generated with spatially-dependent model half an order of magnitude earlier than either Morans $I$ or Gearys $C$. Finally, we prove several theoretical properties of $S_A$: namely that it behaves as a true correlation statistic, and is invariant under addition or multiplication by a constant.
Spatial processes with nonstationary and anisotropic covariance structure are often used when modelling, analysing and predicting complex environmental phenomena. Such processes may often be expressed as ones that have stationary and isotropic covariance structure on a warped spatial domain. However, the warping function is generally difficult to fit and not constrained to be injective, often resulting in `space-folding. Here, we propose modelling an injective warping function through a composition of multiple elemental injective functions in a deep-learning framework. We consider two cases; first, when these functions are known up to some weights that need to be estimated, and, second, when the weights in each layer are random. Inspired by recent methodological and technological advances in deep learning and deep Gaussian processes, we employ approximate Bayesian methods to make inference with these models using graphics processing units. Through simulation studies in one and two dimensions we show that the deep compositional spatial models are quick to fit, and are able to provide better predictions and uncertainty quantification than other deep stochastic models of similar complexity. We also show their remarkable capacity to model nonstationary, anisotropic spatial data using radiances from the MODIS instrument aboard the Aqua satellite.
The areal modeling of the extremes of a natural process such as rainfall or temperature is important in environmental statistics; for example, understanding extreme areal rainfall is crucial in flood protection. This article reviews recent progress in the statistical modeling of spatial extremes, starting with sketches of the necessary elements of extreme value statistics and geostatistics. The main types of statistical models thus far proposed, based on latent variables, on copulas and on spatial max-stable processes, are described and then are compared by application to a data set on rainfall in Switzerland. Whereas latent variable modeling allows a better fit to marginal distributions, it fits the joint distributions of extremes poorly, so appropriately-chosen copula or max-stable models seem essential for successful spatial modeling of extremes.
The goal of the article is to develop the approach of substationarity to spatial point processes (SPPs). Substationarity is a new concept, which has never been studied in the literature. It means that the distribution of SPPs can only be invariant under location shifts within a linear subspace of the domain. Theoretically, substationarity is a concept between stationariy and nonstationarity, but it belongs to nonstationarity. To formally propose the approach, the article provides the definition of substationarity and an estimation method for the first-order intensity function. As the linear subspace may be unknown, it recommends using a parametric way to estimate the linear subspace and a nonparametric way to estimate the first-order intensity function, indicating that it is a semiparametric approach. The simulation studies show that both the estimators of the linear subspace and the first-order intensity function are reliable. In an application to a forest wildfire data set, the article concludes that substationarity of wildfire occurrences may be assumed along the longitude, indicating that latitude is a more important factor than longitude in forest wildfire studies.