No Arabic abstract
Deep Gaussian Processes (DGP) are hierarchical generalizations of Gaussian Processes (GP) that have proven to work effectively on a multiple supervised regression tasks. They combine the well calibrated uncertainty estimates of GPs with the great flexibility of multilayer models. In DGPs, given the inputs, the outputs of the layers are Gaussian distributions parameterized by their means and covariances. These layers are realized as Sparse GPs where the training data is approximated using a small set of pseudo points. In this work, we show that the computational cost of DGPs can be reduced with no loss in performance by using a separate, smaller set of pseudo points when calculating the layerwise variance while using a larger set of pseudo points when calculating the layerwise mean. This enabled us to train larger models that have lower cost and better predictive performance.
Gaussian Processes (textbf{GPs}) are flexible non-parametric models with strong probabilistic interpretation. While being a standard choice for performing inference on time series, GPs have few techniques to work in a streaming setting. cite{bui2017streaming} developed an efficient variational approach to train online GPs by using sparsity techniques: The whole set of observations is approximated by a smaller set of inducing points (textbf{IPs}) and moved around with new data. Both the number and the locations of the IPs will affect greatly the performance of the algorithm. In addition to optimizing their locations, we propose to adaptively add new points, based on the properties of the GP and the structure of the data.
We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline.
Inter-domain Gaussian processes (GPs) allow for high flexibility and low computational cost when performing approximate inference in GP models. They are particularly suitable for modeling data exhibiting global structure but are limited to stationary covariance functions and thus fail to model non-stationary data effectively. We propose Inter-domain Deep Gaussian Processes, an extension of inter-domain shallow GPs that combines the advantages of inter-domain and deep Gaussian processes (DGPs), and demonstrate how to leverage existing approximate inference methods to perform simple and scalable approximate inference using inter-domain features in DGPs. We assess the performance of our method on a range of regression tasks and demonstrate that it outperforms inter-domain shallow GPs and conventional DGPs on challenging large-scale real-world datasets exhibiting both global structure as well as a high-degree of non-stationarity.
We present a multi-task learning formulation for Deep Gaussian processes (DGPs), through non-linear mixtures of latent processes. The latent space is composed of private processes that capture within-task information and shared processes that capture across-task dependencies. We propose two different methods for segmenting the latent space: through hard coding shared and task-specific processes or through soft sharing with Automatic Relevance Determination kernels. We show that our formulation is able to improve the learning performance and transfer information between the tasks, outperforming other probabilistic multi-task learning models across real-world and benchmarking settings.
We present a methodology for integrating functional data into deep densely connected feed-forward neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying functional weights; these results were confirmed through real applications and simulation studies. A forthcoming R package is developed on top of a popular deep learning library (Keras) allowing for general use of the approach.