ترغب بنشر مسار تعليمي؟ اضغط هنا

Fast Spatial Autocorrelation

60   0   0.0 ( 0 )
 نشر من قبل Anar Amgalan
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Morans $I$ and Gearys $C$ are widely used to measure spatial autocorrelation, they are slow: all popular methods run in $Omega(n^2)$ time, rendering them unusable for large data sets, or long time-courses with moderate numbers of points. We propose a new $S_A$ statistic based on the notion that the variance observed when merging pairs of nearby clusters should increase slowly for spatially autocorrelated variables. We give a linear-time algorithm to calculate $S_A$ for a variable with an input agglomeration order (available at https://github.com/aamgalan/spatial_autocorrelation). For a typical dataset of $n approx 63,000$ points, our $S_A$ autocorrelation measure can be computed in 1 second, versus 2 hours or more for Morans $I$ and Gearys $C$. Through simulation studies, we demonstrate that $S_A$ identifies spatial correlations in variables generated with spatially-dependent model half an order of magnitude earlier than either Morans $I$ or Gearys $C$. Finally, we prove several theoretical properties of $S_A$: namely that it behaves as a true correlation statistic, and is invariant under addition or multiplication by a constant.

قيم البحث

اقرأ أيضاً

187 - Samuel I. Watson 2020
Clusters form the basis of a number of research study designs including survey and experimental studies. Cluster-based designs can be less costly but also less efficient than individual-based designs due to correlation between individuals within the same cluster. Their design typically relies on textit{ad hoc} choices of correlation parameters, and is insensitive to variations in cluster design. This article examines how to efficiently design clusters where they are geographically defined by demarcating areas incorporating individuals and households or other units. Using geostatistical models for spatial autocorrelation we generate approximations to within cluster average covariance in order to estimate the effective sample size given particular cluster design parameters. We show how the number of enumerated locations, cluster area, proportion sampled, and sampling method affect the efficiency of the design and consider the optimization problem of choosing the most efficient design subject to budgetary constraints. We also consider how the parameters from these approximations can be interpreted simply in terms of `real-world quantities and used in design analysis.
103 - Yawen Guan , Murali Haran 2019
Spatial generalized linear mixed models (SGLMMs) are popular and flexible models for non-Gaussian spatial data. They are useful for spatial interpolations as well as for fitting regression models that account for spatial dependence, and are commonly used in many disciplines such as epidemiology, atmospheric science, and sociology. Inference for SGLMMs is typically carried out under the Bayesian framework at least in part because computational issues make maximum likelihood estimation challenging, especially when high-dimensional spatial data are involved. Here we provide a computationally efficient projection-based maximum likelihood approach and two computationally efficient algorithms for routinely fitting SGLMMs. The two algorithms proposed are both variants of expectation maximization (EM) algorithm, using either Markov chain Monte Carlo or a Laplace approximation for the conditional expectation. Our methodology is general and applies to both discrete-domain (Gaussian Markov random field) as well as continuous-domain (Gaussian process) spatial models. Our methods are also able to adjust for spatial confounding issues that often lead to problems with interpreting regression coefficients. We show, via simulation and real data applications, that our methods perform well both in terms of parameter estimation as well as prediction. Crucially, our methodology is computationally efficient and scales well with the size of the data and is applicable to problems where maximum likelihood estimation was previously infeasible.
Estimation of autocorrelations and spectral densities is of fundamental importance in many fields of science, from identifying pulsar signals in astronomy to measuring heart beats in medicine. In circumstances where one is interested in specific auto correlation functions that do not fit into any simple families of models, such as auto-regressive moving average (ARMA), estimating model parameters is generally approached in one of two ways: by fitting the model autocorrelation function to a non-parameteric autocorrelation estimate via regression analysis or by fitting the model autocorrelation function directly to the data via maximum likelihood. Prior literature suggests that variogram regression yields parameter estimates of comparable quality to maximum likelihood. In this letter we demonstrate that, as sample size is increases, the accuracy of the maximum-likelihood estimates (MLE) ultimately improves by orders of magnitude beyond that of variogram regression. For relatively continuous and Gaussian processes, this improvement can occur for sample sizes of less than 100. Moreover, even where the accuracy of these methods is comparable, the MLE remains almost universally better and, more critically, variogram regression does not provide reliable confidence intervals. Inaccurate regression parameter estimates are typically accompanied by underestimated standard errors, whereas likelihood provides reliable confidence intervals.
Estimating the first-order intensity function in point pattern analysis is an important problem, and it has been approached so far from different perspectives: parametrically, semiparametrically or nonparametrically. Our approach is close to a semipa rametric one. Motivated by eye-movement data, we introduce a convolution type model where the log-intensity is modelled as the convolution of a function $beta(cdot)$, to be estimated, and a single spatial covariate (the image an individual is looking at for eye-movement data). Based on a Fourier series expansion, we show that the proposed model is related to the log-linear model with infinite number of coefficients, which correspond to the spectral decomposition of $beta(cdot)$. After truncation, we estimate these coefficients through a penalized Poisson likelihood and prove infill asymptotic results for a large class of spatial point processes. We illustrate the efficiency of the proposed methodology on simulated data and real data.
Spatial statistics is an area of study devoted to the statistical analysis of data that have a spatial label associated with them. Geographers often refer to the location information associated with the attribute information, whose study defines a re search area called spatial analysis. Many of the ways to manipulate spatial data are driven by algorithms with no uncertainty quantification associated with them. When a spatial analysis is statistical, that is, it incorporates uncertainty quantification, it falls in the research area called spatial statistics. The primary feature of spatial statistical models is that nearby attribute values are more statistically dependent than distant attribute values; this is a paraphrasing of what is sometimes called the First Law of Geography (Tobler, 1970).
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا