Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

67 0 0.0 ( 0 )

Download Cite

Added by Kyungjoo Kim

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Kyungjoo Kim - Sivasankaran Rajamanickam - George Stelle

Mathematical Software

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Cholesky-by-blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

rate research

Sparse Cholesky factorization by Kullback-Leibler minimization

111 - Florian Schafer , Matthias Katzfuss , 2020

We propose to compute a sparse approximate inverse Cholesky factor $L$ of a dense covariance matrix $Theta$ by minimizing the Kullback-Leibler divergence between the Gaussian distributions $mathcal{N}(0, Theta)$ and $mathcal{N}(0, L^{-top} L^{-1})$, subject to a sparsity constraint. Surprisingly, this problem has a closed-form solution that can be computed efficiently, recovering the popular Vecchia approximation in spatial statistics. Based on recent results on the approximate sparsity of inverse Cholesky factors of $Theta$ obtained from pairwise evaluation of Greens functions of elliptic boundary-value problems at points ${x_{i}}_{1 leq i leq N} subset mathbb{R}^{d}$, we propose an elimination ordering and sparsity pattern that allows us to compute $epsilon$-approximate inverse Cholesky factors of such $Theta$ in computational complexity $mathcal{O}(N log(N/epsilon)^d)$ in space and $mathcal{O}(N log(N/epsilon)^{2d})$ in time. To the best of our knowledge, this is the best asymptotic complexity for this class of problems. Furthermore, our method is embarrassingly parallel, automatically exploits low-dimensional structure in the data, and can perform Gaussian-process regression in linear (in $N$) space complexity. Motivated by the optimality properties of our methods, we propose methods for applying it to the joint covariance of training and prediction points in Gaussian-process regression, greatly improving stability and computational cost. Finally, we show how to apply our method to the important setting of Gaussian processes with additive noise, sacrificing neither accuracy nor computational complexity.

Numerical Analysis Numerical Analysis Optimization and Control

A 55-line code for large-scale parallel topology optimization in 2D and 3D

79 - Abhinav Gupta , Rajib Chowdhury , Anupam Chakrabarti 2020

This paper presents a 55-line code written in python for 2D and 3D topology optimization (TO) based on the open-source finite element computing software (FEniCS), equipped with various finite element tools and solvers. PETSc is used as the linear algebra back-end, which results in significantly less computational time than standard python libraries. The code is designed based on the popular solid isotropic material with penalization (SIMP) methodology. Extensions to multiple load cases, different boundary conditions, and incorporation of passive elements are also presented. Thus, this implementation is the most compact implementation of SIMP based topology optimization for 3D as well as 2D problems. Utilizing the concept of Euclidean distance matrix to vectorize the computation of the weight matrix for the filter, we have achieved a substantial reduction in the computational time and have also made it possible for the code to work with complex ground structure configurations. We have also presented the codes extension to large-scale topology optimization problems with support for parallel computations on complex structural configuration, which could help students and researchers explore novel insights into the TO problem with dense meshes. Appendix-A contains the complete code, and the website: url{https://github.com/iitrabhi/topo-fenics} also contains the complete code.

Mathematical Software Computational Engineering Optimization and Control

FlexDM: Enabling robust and reliable parallel data mining using WEKA

203 - Madison Flannery , David M Budden , Alexandre Mendes 2014

Performing massive data mining experiments with multiple datasets and methods is a common task faced by most bioinformatics and computational biology laboratories. WEKA is a machine learning package designed to facilitate this task by providing tools that allow researchers to select from several classification methods and specific test strategies. Despite its popularity, the current WEKA environment for batch experiments, namely Experimenter, has four limitations that impact its usability: the selection of value ranges for methods options lacks flexibility and is not intuitive; there is no support for parallelisation when running large-scale data mining tasks; the XML schema is difficult to read, necessitating the use of the Experimenters graphical user interface for generation and modification; and robustness is limited by the fact that results are not saved until the last test has concluded. FlexDM implements an interface to WEKA to run batch processing tasks in a simple and intuitive way. In a short and easy-to-understand XML file, one can define hundreds of tests to be performed on several datasets. FlexDM also allows those tests to be executed asynchronously in parallel to take advantage of multi-core processors, significantly increasing usability and productivity. Results are saved incrementally for better robustness and reliability. FlexDM is implemented in Java and runs on Windows, Linux and OSX. As we encourage other researchers to explore and adopt our software, FlexDM is made available as a pre-configured bootable reference environment. All code, supporting documentation and usage examples are also available for download at http://sourceforge.net/projects/flexdm.

Mathematical Software Software Engineering

New rigorous perturbation bounds for the generalized Cholesky factorization

130 - Hanyu Li , Yanfei Yang 2014

Some new rigorous perturbation bounds for the generalized Cholesky factorization with normwise or componentwise perturbations in the given matrix are obtained, where the componentwise perturbation has the form of backward rounding error for the generalized Cholesky factorization algorithm. These bounds can be much tighter than some existing ones while the conditions for them to hold are simple and moderate.

Numerical Analysis

Reverse Cholesky factorization and tensor products of nest algebras

52 - Vern I. Paulsen , Hugo J. Woerdeman 2017

We prove that every positive semidefinite matrix over the natural numbers that is eventually 0 in each row and column can be factored as the product of an upper triangular matrix times a lower triangular matrix. We also extend some known results about factorization with respect to tensor products of nest algebras. Our proofs use the theory of reproducing kernel Hilbert spaces.

Functional Analysis