No Arabic abstract
We recover jump-sparse and sparse signals from blurred incomplete data corrupted by (possibly non-Gaussian) noise using inverse Potts energy functionals. We obtain analytical results (existence of minimizers, complexity) on inverse Potts functionals and provide relations to sparsity problems. We then propose a new optimization method for these functionals which is based on dynamic programming and the alternating direction method of multipliers (ADMM). A series of experiments shows that the proposed method yields very satisfactory jump-sparse and sparse reconstructions, respectively. We highlight the capability of the method by comparing it with classical and recent approaches such as TV minimization (jump-sparse signals), orthogonal matching pursuit, iterative hard thresholding, and iteratively reweighted $ell^1$ minimization (sparse signals).
We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Theta$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $mathcal{N}(0, Theta)$ and $mathcal{N}(0, L^{-top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia approximation in spatial statistics. Based on recent results on the approximate sparsity of inverse Cholesky factors of $Theta$ obtained from pairwise evaluation of Greens functions of elliptic boundary-value problems at points ${x_{i}}_{1 leq i leq N} subset mathbb{R}^{d}$, we propose an elimination ordering and sparsity pattern that allows us to compute $epsilon$-approximate inverse Cholesky factors of such $Theta$ in computational complexity $mathcal{O}(N log(N/epsilon)^d)$ in space and $mathcal{O}(N log(N/epsilon)^{2d})$ in time. To the best of our knowledge, this is the best asymptotic complexity for this class of problems. Furthermore, our method is embarrassingly parallel, automatically exploits low-dimensional structure in the data, and can perform Gaussian-process regression in linear (in $N$) space complexity. Motivated by the optimality properties of our methods, we propose methods for applying it to the joint covariance of training and prediction points in Gaussian-process regression, greatly improving stability and computational cost. Finally, we show how to apply our method to the important setting of Gaussian processes with additive noise, sacrificing neither accuracy nor computational complexity.
We examine sparse grid quadrature on weighted tensor products (WTP) of reproducing kernel Hilbert spaces on products of the unit sphere, in the case of worst case quadrature error for rules with arbitrary quadrature weights. We describe a dimension adaptive quadrature algorithm based on an algorithm of Hegland (2003), and also formulate a version of Wasilkowski and Wozniakowskis WTP algorithm (1999), here called the WW algorithm. We prove that the dimension adaptive algorithm is optimal in the sense of Dantzig (1957) and therefore no greater in cost than the WW algorithm. Both algorithms therefore have the optimal asymptotic rate of convergence given by Theorem 3 of Wasilkowski and Wozniakowski (1999). A numerical example shows that, even though the asymptotic convergence rate is optimal, if the dimension weights decay slowly enough, and the dimensionality of the problem is large enough, the initial convergence of the dimension adaptive algorithm can be slow.
The randomized sparse Kaczmarz method was recently proposed to recover sparse solutions of linear systems. In this work, we introduce a greedy variant of the randomized sparse Kaczmarz method by employing the sampling Kaczmarz-Motzkin method, and prove its linear convergence in expectation with respect to the Bregman distance in the noiseless and noisy cases. This greedy variant can be viewed as a unification of the sampling Kaczmarz-Motzkin method and the randomized sparse Kaczmarz method, and hence inherits the merits of these two methods. Numerically, we report a couple of experimental results to demonstrate its superiority
Information is extracted from large and sparse data sets organized as 3-mode tensors. Two methods are described, based on best rank-(2,2,2) and rank-(2,2,1) approximation of the tensor. The first method can be considered as a generalization of spectral graph partitioning to tensors, and it gives a reordering of the tensor that clusters the information. The second method gives an expansion of the tensor in sparse rank-(2,2,1) terms, where the terms correspond to graphs. The low-rank approximations are computed using an efficient Krylov-Schur type algorithm that avoids filling in the sparse data. The methods are applied to topic search in news text, a tensor representing conference author-terms-years, and network traffic logs.
The problem of partitioning a large and sparse tensor is considered, where the tensor consists of a sequence of adjacency matrices. Theory is developed that is a generalization of spectral graph partitioning. A best rank-$(2,2,lambda)$ approximation is computed for $lambda=1,2,3$, and the partitioning is computed from the orthogonal matrices and the core tensor of the approximation. It is shown that if the tensor has a certain reducibility structure, then the solution of the best approximation problem exhibits the reducibility structure of the tensor. Further, if the tensor is close to being reducible, then still the solution of the exhibits the structure of the tensor. Numerical examples with synthetic data corroborate the theoretical results. Experiments with tensors from applications show that the method can be used to extract relevant information from large, sparse, and noisy data.