No Arabic abstract
Gaussian conditional realizations are routinely used for risk assessment and planning in a variety of Earth sciences applications. Conditional realizations can be obtained by first creating unconditional realizations that are then post-conditioned by kriging. Many efficient algorithms are available for the first step, so the bottleneck resides in the second step. Instead of doing the conditional simulations with the desired covariance (F approach) or with a tapered covariance (T approach), we propose to use the taper covariance only in the conditioning step (Half-Taper or HT approach). This enables to speed up the computations and to reduce memory requirements for the conditioning step but also to keep the right short scale variations in the realizations. A criterion based on mean square error of the simulation is derived to help anticipate the similarity of HT to F. Moreover, an index is used to predict the sparsity of the kriging matrix for the conditioning step. Some guides for the choice of the taper function are discussed. The distributions of a series of 1D, 2D and 3D scalar response functions are compared for F, T and HT approaches. The distributions obtained indicate a much better similarity to F with HT than with T.
A hybrid estimator of the log-spectral density of a stationary time series is proposed. First, a multiple taper estimate is performed, followed by kernel smoothing the log-multiple taper estimate. This procedure reduces the expected mean square error by $(pi^2/ 4)^{4/5} $ over simply smoothing the log tapered periodogram. A data adaptive implementation of a variable bandwidth kernel smoother is given.
Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution-approximation framework, a taper version and a domain-partitioning (block) version. We describe theoretical properties and inference procedures, and study the computational complexity of the methods. Numerical comparisons and an application to satellite data are also provided.
This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an (unknown) fraction of (fixed size) of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC (ISS-MCMC), is a generic and flexible approach which, contrary to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e. the chain distribution approximates the posterior, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.
We establish verifiable conditions under which Metropolis Hastings (MH) algorithms with position-dependent proposal covariance matrix will or will not have geometric rate of convergence. Some of the diffusions based MH algorithms like Metropolis adjusted Langevin algorithms (MALA) and Pre-conditioned MALA (PCMALA) have position independent proposal variance. Whereas, for other variants of MALA like manifold MALA (MMALA), the proposal covariance matrix changes in every iteration. Thus, we provide conditions for geometric ergodicity of different variations of Langevin algorithms. These conditions are verified in the context of conditional simulation from the two most popular generalized linear mixed models (GLMMs), namely the binomial GLMM with logit link and the Poisson GLMM with log link. Empirical comparison in the framework of some spatial GLMMs shows that computationally less expensive PCMALA with an appropriately chosen pre-conditioning matrix may outperform MMALA.
Light and Widely Applicable (LWA-) MCMC is a novel approximation of the Metropolis-Hastings kernel targeting a posterior distribution defined on a large number of observations. Inspired by Approximate Bayesian Computation, we design a Markov chain whose transition makes use of an unknown but fixed, fraction of the available data, where the random choice of sub-sample is guided by the fidelity of this sub-sample to the observed data, as measured by summary (or sufficient) statistics. LWA-MCMC is a generic and flexible approach, as illustrated by the diverse set of examples which we explore. In each case LWA-MCMC yields excellent performance and in some cases a dramatic improvement compared to existing methodologies.