No Arabic abstract
We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline.
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We propose the harmonic kernel decomposition (HKD), which uses Fourier series to decompose a kernel as a sum of orthogonal kernels. Our variational approximation exploits this orthogonality to enable a large number of inducing points at a low computational cost. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections, and it significantly outperforms standard variational methods in scalability and accuracy. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
Deep Gaussian Processes (DGP) are hierarchical generalizations of Gaussian Processes (GP) that have proven to work effectively on a multiple supervised regression tasks. They combine the well calibrated uncertainty estimates of GPs with the great flexibility of multilayer models. In DGPs, given the inputs, the outputs of the layers are Gaussian distributions parameterized by their means and covariances. These layers are realized as Sparse GPs where the training data is approximated using a small set of pseudo points. In this work, we show that the computational cost of DGPs can be reduced with no loss in performance by using a separate, smaller set of pseudo points when calculating the layerwise variance while using a larger set of pseudo points when calculating the layerwise mean. This enabled us to train larger models that have lower cost and better predictive performance.
The accurate approximation of high-dimensional functions is an essential task in uncertainty quantification and many other fields. We propose a new function approximation scheme based on a spectral extension of the tensor-train (TT) decomposition. We first define a functional version of the TT decomposition and analyze its properties. We obtain results on the convergence of the decomposition, revealing links between the regularity of the function, the dimension of the input space, and the TT ranks. We also show that the regularity of the target function is preserved by the univariate functions (i.e., the cores) comprising the functional TT decomposition. This result motivates an approximation scheme employing polynomial approximations of the cores. For functions with appropriate regularity, the resulting textit{spectral tensor-train decomposition} combines the favorable dimension-scaling of the TT decomposition with the spectral convergence rate of polynomial approximations, yielding efficient and accurate surrogates for high-dimensional functions. To construct these decompositions, we use the sampling algorithm texttt{TT-DMRG-cross} to obtain the TT decomposition of tensors resulting from suitable discretizations of the target function. We assess the performance of the method on a range of numerical examples: a modifed set of Genz functions with dimension up to $100$, and functions with mixed Fourier modes or with local features. We observe significant improvements in performance over an anisotropic adaptive Smolyak approach. The method is also used to approximate the solution of an elliptic PDE with random input data. The open source software and examples presented in this work are available online.
We apply numerical methods in combination with finite-difference-time-domain (FDTD) simulations to optimize transmission properties of plasmonic mirror color filters using a multi-objective figure of merit over a five-dimensional parameter space by utilizing novel multi-fidelity Gaussian processes approach. We compare these results with conventional derivative-free global search algorithms, such as (single-fidelity) Gaussian Processes optimization scheme, and Particle Swarm Optimization---a commonly used method in nanophotonics community, which is implemented in Lumerical commercial photonics software. We demonstrate the performance of various numerical optimization approaches on several pre-collected real-world datasets and show that by properly trading off expensive information sources with cheap simulations, one can more effectively optimize the transmission properties with a fixed budget.
Refining low-resolution (LR) spatial fields with high-resolution (HR) information is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an inverse of the conditional expectation, namely a deconditioning problem. In this work, we introduce conditional mean processes (CMP), a new class of Gaussian Processes describing conditional means. By treating CMPs as inter-domain features of the underlying field, a posterior for the latent field can be established as a solution to the deconditioning problem. Furthermore, we show that this solution can be viewed as a two-staged vector-valued kernel ridge regressor and show that it has a minimax optimal convergence rate under mild assumptions. Lastly, we demonstrate its proficiency in a synthetic and a real-world atmospheric field downscaling problem, showing substantial improvements over existing methods.