No Arabic abstract
Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution-approximation framework, a taper version and a domain-partitioning (block) version. We describe theoretical properties and inference procedures, and study the computational complexity of the methods. Numerical comparisons and an application to satellite data are also provided.
Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method.
Large, non-Gaussian spatial datasets pose a considerable modeling challenge as the dependence structure implied by the model needs to be captured at different scales, while retaining feasible inference. Skew-normal and skew-t distributions have only recently begun to appear in the spatial statistics literature, without much consideration, however, for the ability to capture dependence at multiple resolutions, and simultaneously achieve feasible inference for increasingly large data sets. This article presents the first multi-resolution spatial model inspired by the skew-t distribution, where a large-scale effect follows a multivariate normal distribution and the fine-scale effects follow a multivariate skew-normal distributions. The resulting marginal distribution for each region is skew-t, thereby allowing for greater flexibility in capturing skewness and heavy tails characterizing many environmental datasets. Likelihood-based inference is performed using a Monte Carlo EM algorithm. The model is applied as a stochastic generator of daily wind speeds over Saudi Arabia.
Generalized Gaussian processes (GGPs) are highly flexible models that combine latent GPs with potentially non-Gaussian likelihoods from the exponential family. GGPs can be used in a variety of settings, including GP classification, nonparametric count regression, modeling non-Gaussian spatial data, and analyzing point patterns. However, inference for GGPs can be analytically intractable, and large datasets pose computational challenges due to the inversion of the GP covariance matrix. We propose a Vecchia-Laplace approximation for GGPs, which combines a Laplace approximation to the non-Gaussian likelihood with a computationally efficient Vecchia approximation to the GP, resulting in a simple, general, scalable, and accurate methodology. We provide numerical studies and comparisons on simulated and real spatial data. Our methods are implemented in a freely available R package.
This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an (unknown) fraction of (fixed size) of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC (ISS-MCMC), is a generic and flexible approach which, contrary to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e. the chain distribution approximates the posterior, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.
Spatio-temporal data sets are rapidly growing in size. For example, environmental variables are measured with ever-higher resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with proper uncertainty quantification. We focus here on real-time filtering inference in linear Gaussian state-space models. At each time point, the state is a spatial field evaluated on a very large spatial grid, making exact inference using the Kalman filter computationally infeasible. Instead, we propose a multi-resolution filter (MRF), a highly scalable and fully probabilistic filtering method that resolves spatial features at all scales. We prove that the MRF matrices exhibit a particular block-sparse multi-resolution structure that is preserved under filtering operations through time. We also discuss inference on time-varying parameters using an approximate Rao-Blackwellized particle filter, in which the integrated likelihood is computed using the MRF. We compare the MRF to existing approaches in a simulation study and a real satellite-data application.