No Arabic abstract
Gaussian processes are powerful, yet analytically tractable models for supervised learning. A Gaussian process is characterized by a mean function and a covariance function (kernel), which are determined by a model selection criterion. The functions to be compared do not just differ in their parametrization but in their fundamental structure. It is often not clear which function structure to choose, for instance to decide between a squared exponential and a rational quadratic kernel. Based on the principle of approximation set coding, we develop a framework for model selection to rank kernels for Gaussian process regression. In our experiments approximation set coding shows promise to become a model selection criterion competitive with maximum evidence (also called marginal likelihood) and leave-one-out cross-validation.
Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n^2) space and O(n^3) time for a dataset of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons.
Monge-Kantorovich distances, otherwise known as Wasserstein distances, have received a growing attention in statistics and machine learning as a powerful discrepancy measure for probability distributions. In this paper, we focus on forecasting a Gaussian process indexed by probability distributions. For this, we provide a family of positive definite kernels built using transportation based distances. We provide a probabilistic understanding of these kernels and characterize the corresponding stochastic processes. We prove that the Gaussian processes indexed by distributions corresponding to these kernels can be efficiently forecast, opening new perspectives in Gaussian process modeling.
We introduce Latent Gaussian Process Regression which is a latent variable extension allowing modelling of non-stationary multi-modal processes using GPs. The approach is built on extending the input space of a regression problem with a latent variable that is used to modulate the covariance function over the training data. We show how our approach can be used to model multi-modal and non-stationary processes. We exemplify the approach on a set of synthetic data and provide results on real data from motion capture and geostatistics.
A primary goal of computer experiments is to reconstruct the function given by the computer code via scattered evaluations. Traditional isotropic Gaussian process models suffer from the curse of dimensionality, when the input dimension is high. Gaussian process models with additive correlation functions are scalable to dimensionality, but they are very restrictive as they only work for additive functions. In this work, we consider a projection pursuit model, in which the nonparametric part is driven by an additive Gaussian process regression. The dimension of the additive function is chosen to be higher than the original input dimension. We show that this dimension expansion can help approximate more complex functions. A gradient descent algorithm is proposed to maximize the likelihood function. Simulation studies show that the proposed method outperforms the traditional Gaussian process models.
Learning in Gaussian Process models occurs through the adaptation of hyperparameters of the mean and the covariance function. The classical approach entails maximizing the marginal likelihood yielding fixed point estimates (an approach called textit{Type II maximum likelihood} or ML-II). An alternative learning procedure is to infer the posterior over hyperparameters in a hierarchical specification of GPs we call textit{Fully Bayesian Gaussian Process Regression} (GPR). This work considers two approximation schemes for the intractable hyperparameter posterior: 1) Hamiltonian Monte Carlo (HMC) yielding a sampling-based approximation and 2) Variational Inference (VI) where the posterior over hyperparameters is approximated by a factorized Gaussian (mean-field) or a full-rank Gaussian accounting for correlations between hyperparameters. We analyze the predictive performance for fully Bayesian GPR on a range of benchmark data sets.