Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sparse Cholesky factorization by Kullback-Leibler minimization

112 0 0.0 ( 0 )

Download Cite

Added by Florian Sch\\\"afer

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Florian Schafer - Matthias Katzfuss -

Numerical Analysis Numerical Analysis Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Theta$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $mathcal{N}(0, Theta)$ and $mathcal{N}(0, L^{-top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia approximation in spatial statistics. Based on recent results on the approximate sparsity of inverse Cholesky factors of $Theta$ obtained from pairwise evaluation of Greens functions of elliptic boundary-value problems at points ${x_{i}}_{1 leq i leq N} subset mathbb{R}^{d}$, we propose an elimination ordering and sparsity pattern that allows us to compute $epsilon$-approximate inverse Cholesky factors of such $Theta$ in computational complexity $mathcal{O}(N log(N/epsilon)^d)$ in space and $mathcal{O}(N log(N/epsilon)^{2d})$ in time. To the best of our knowledge, this is the best asymptotic complexity for this class of problems. Furthermore, our method is embarrassingly parallel, automatically exploits low-dimensional structure in the data, and can perform Gaussian-process regression in linear (in $N$) space complexity. Motivated by the optimality properties of our methods, we propose methods for applying it to the joint covariance of training and prediction points in Gaussian-process regression, greatly improving stability and computational cost. Finally, we show how to apply our method to the important setting of Gaussian processes with additive noise, sacrificing neither accuracy nor computational complexity.

rate research

Jump-sparse and sparse recovery using Potts functionals

365 - Martin Storath , Andreas Weinmann , Laurent Demaret 2013

We recover jump-sparse and sparse signals from blurred incomplete data corrupted by (possibly non-Gaussian) noise using inverse Potts energy functionals. We obtain analytical results (existence of minimizers, complexity) on inverse Potts functionals and provide relations to sparsity problems. We then propose a new optimization method for these functionals which is based on dynamic programming and the alternating direction method of multipliers (ADMM). A series of experiments shows that the proposed method yields very satisfactory jump-sparse and sparse reconstructions, respectively. We highlight the capability of the method by comparing it with classical and recent approaches such as TV minimization (jump-sparse signals), orthogonal matching pursuit, iterative hard thresholding, and iteratively reweighted $ell^1$ minimization (sparse signals).

Numerical Analysis Numerical Analysis Optimization and Control

Primal-Dual Algorithms for Non-negative Matrix Factorization with the Kullback-Leibler Divergence

579 - Felipe Yanez , Francis Bach (LIENS 2014

Non-negative matrix factorization (NMF) approximates a given matrix as a product of two non-negative matrices. Multiplicative algorithms deliver reliable results, but they show slow convergence for high-dimensional data and may be stuck away from local minima. Gradient descent methods have better behavior, but only apply to smooth losses such as the least-squares loss. In this article, we propose a first-order primal-dual algorithm for non-negative decomposition problems (where one factor is fixed) with the KL divergence, based on the Chambolle-Pock algorithm. All required computations may be obtained in closed form and we provide an efficient heuristic way to select step-sizes. By using alternating optimization, our algorithm readily extends to NMF and, on synthetic examples, face recognition or music source separation datasets, it is either faster than existing algorithms, or leads to improved local optima, or both.

Machine Learning Optimization and Control

Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

299 - Taisuke Kobayashi 2021

This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization, and derives a new optimization method using forward KL divergence. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation process with explicit consideration of optimality as stochastic variable. This paper follows this concept and formulates the traditional learning laws for both value function and policy as the optimization problems with reverse KL divergence including optimality. Focusing on the asymmetry of KL divergence, the new optimization problems with forward KL divergence are derived. Remarkably, such new optimization problems can be regarded as optimistic RL. That optimism is intuitively specified by a hyperparameter converted from an uncertainty parameter. In addition, it can be enhanced when it is integrated with prioritized experience replay and eligibility traces, both of which accelerate learning. The effects of this expected optimism was investigated through learning tendencies on numerical simulations using Pybullet. As a result, moderate optimism accelerated learning and yielded higher rewards. In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.

Machine Learning

New rigorous perturbation bounds for the generalized Cholesky factorization

349 - Hanyu Li , Yanfei Yang 2014

Some new rigorous perturbation bounds for the generalized Cholesky factorization with normwise or componentwise perturbations in the given matrix are obtained, where the componentwise perturbation has the form of backward rounding error for the generalized Cholesky factorization algorithm. These bounds can be much tighter than some existing ones while the conditions for them to hold are simple and moderate.

Numerical Analysis

RCHOL: Randomized Cholesky Factorization for Solving SDD Linear Systems

71 - Chao Chen , Tianyu Liang , George Biros 2020

We introduce a randomized algorithm, namely RCHOL, to construct an approximate Cholesky factorization for a given Laplacian matrix (a.k.a., graph Laplacian). From a graph perspective, the exact Cholesky factorization introduces a clique in the underlying graph after eliminating a row/column. By randomization, RCHOL only retains a sparse subset of the edges in the clique using a random sampling developed by Spielman and Kyng. We prove RCHOL is breakdown-free and apply it to solving large sparse linear systems with symmetric diagonally dominant matrices. In addition, we parallelize RCHOL based on the nested dissection ordering for shared-memory machines. We report numerical experiments that demonstrate the robustness and the scalability of RCHOL. For example, our parallel code scaled up to 64 threads on a single node for solving the 3D Poisson equation, discretized with the 7-point stencil on a $1024times 1024 times 1024$ grid, a problem that has one billion unknowns.

Numerical Analysis Mathematical Software Numerical Analysis

comments

Fetching comments

Information Technology Institute ITI

Additional details More universities

Sparse Cholesky factorization by Kullback-Leibler minimization

Ask ChatGPT about the research

No Arabic abstract

Read More