Large-scale local surrogate modeling of stochastic simulation experiments


Abstract in English

Gaussian process (GP) regression in large-data contexts, which often arises in surrogate modeling of stochastic simulation experiments, is challenged by cubic runtimes. Coping with input-dependent noise in that setting is doubly so. Recent advances target reduced computational complexity through local approximation (e.g., LAGP) or otherwise induced sparsity. Yet these do not economically accommodate a common design feature when attempting to separate signal from noise. Replication can offer both statistical and computational efficiencies, motivating several extensions to the local surrogate modeling toolkit. Introducing a nugget into a local kernel structure is just the first step. We argue that a new inducing point formulation (LIGP), already preferred over LAGP on the speed-vs-accuracy frontier, conveys additional advantages when replicates are involved. Woodbury identities allow local kernel structure to be expressed in terms of unique design locations only, increasing the amount of data (i.e., the neighborhood size) that may be leveraged without additional flops. We demonstrate that this upgraded LIGP provides more accurate prediction and uncertainty quantification compared to several modern alternatives. Illustrations are provided on benchmark data, real-world simulation experiments on epidemic management and ocean oxygen concentration, and in an options pricing control framework.

Download