No Arabic abstract
Combination of low-tensor rank techniques and the Fast Fourier transform (FFT) based methods had turned out to be prominent in accelerating various statistical operations such as Kriging, computing conditional covariance, geostatistical optimal design, and others. However, the approximation of a full tensor by its low-rank format can be computationally formidable. In this work, we incorporate the robust Tensor Train (TT) approximation of covariance matrices and the efficient TT-Cross algorithm into the FFT-based Kriging. It is shown that here the computational complexity of Kriging is reduced to $mathcal{O}(d r^3 n)$, where $n$ is the mode size of the estimation grid, $d$ is the number of variables (the dimension), and $r$ is the rank of the TT approximation of the covariance matrix. For many popular covariance functions the TT rank $r$ remains stable for increasing $n$ and $d$. The advantages of this approach against those using plain FFT are demonstrated in synthetic and real data examples.
This article reviews the application of advanced Monte Carlo techniques in the context of Multilevel Monte Carlo (MLMC). MLMC is a strategy employed to compute expectations which can be biased in some sense, for instance, by using the discretization of a associated probability law. The MLMC approach works with a hierarchy of biased approximations which become progressively more accurate and more expensive. Using a telescoping representation of the most accurate approximation, the method is able to reduce the computational cost for a given level of error versus i.i.d. sampling from this latter approximation. All of these ideas originated for cases where exact sampling from couples in the hierarchy is possible. This article considers the case where such exact sampling is not currently possible. We consider Markov chain Monte Carlo and sequential Monte Carlo methods which have been introduced in the literature and we describe different strategies which facilitate the application of MLMC within these methods.
In this work we propose an efficient black-box solver for two-dimensional stationary diffusion equations, which is based on a new robust discretization scheme. The idea is to formulate an equation in a certain form without derivatives with a non-local stencil, which leads us to a linear system of equations with dense matrix. This matrix and a right-hand side are represented in a low-rank parametric representation -- the quantized tensor train (QTT-) format, and then all operations are performed with logarithmic complexity and memory consumption. Hence very fine grids can be used, and very accurate solutions with extremely high spatial resolution can be obtained. Numerical experiments show that this formulation gives accurate results and can be used up to $2^{60}$ grid points with no problems with conditioning, while total computational time is around several seconds.
Low-rank tensors are an established framework for high-dimensional least-squares problems. We propose to extend this framework by including the concept of block-sparsity. In the context of polynomial regression each sparsity pattern corresponds to some subspace of homogeneous multivariate polynomials. This allows us to adapt the ansatz space to align better with known sample complexity results. The resulting method is tested in numerical experiments and demonstrates improved computational resource utilization and sample efficiency.
Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational-statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.
The accurate approximation of high-dimensional functions is an essential task in uncertainty quantification and many other fields. We propose a new function approximation scheme based on a spectral extension of the tensor-train (TT) decomposition. We first define a functional version of the TT decomposition and analyze its properties. We obtain results on the convergence of the decomposition, revealing links between the regularity of the function, the dimension of the input space, and the TT ranks. We also show that the regularity of the target function is preserved by the univariate functions (i.e., the cores) comprising the functional TT decomposition. This result motivates an approximation scheme employing polynomial approximations of the cores. For functions with appropriate regularity, the resulting textit{spectral tensor-train decomposition} combines the favorable dimension-scaling of the TT decomposition with the spectral convergence rate of polynomial approximations, yielding efficient and accurate surrogates for high-dimensional functions. To construct these decompositions, we use the sampling algorithm texttt{TT-DMRG-cross} to obtain the TT decomposition of tensors resulting from suitable discretizations of the target function. We assess the performance of the method on a range of numerical examples: a modifed set of Genz functions with dimension up to $100$, and functions with mixed Fourier modes or with local features. We observe significant improvements in performance over an anisotropic adaptive Smolyak approach. The method is also used to approximate the solution of an elliptic PDE with random input data. The open source software and examples presented in this work are available online.