ترغب بنشر مسار تعليمي؟ اضغط هنا

Inv-ASKIT: A Parallel Fast Diret Solver for Kernel Matrices

67   0   0.0 ( 0 )
 نشر من قبل William March
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a parallel algorithm for computing the approximate factorization of an $N$-by-$N$ kernel matrix. Once this factorization has been constructed (with $N log^2 N $ work), we can solve linear systems with this matrix with $N log N $ work. Kernel matrices represent pairwise interactions of points in metric spaces. They appear in machine learning, approximation theory, and computational physics. Kernel matrices are typically dense (matrix multiplication scales quadratically with $N$) and ill-conditioned (solves can require 100s of Krylov iterations). Thus, fast algorithms for matrix multiplication and factorization are critical for scalability. Recently we introduced ASKIT, a new method for approximating a kernel matrix that resembles N-body methods. Here we introduce INV-ASKIT, a factorization scheme based on ASKIT. We describe the new method, derive complexity estimates, and conduct an empirical study of its accuracy and scalability. We report results on real-world datasets including COVTYPE ($0.5$M points in 54 dimensions), SUSY ($4.5$M points in 8 dimensions) and MNIST (2M points in 784 dimensions) using shared and distributed memory parallelism. In our largest run we approximately factorize a dense matrix of size 32M $times$ 32M (generated from points in 64 dimensions) on 4,096 Sandy-Bridge cores. To our knowledge these results improve the state of the art by several orders of magnitude.

قيم البحث

اقرأ أيضاً

The paper presents a combination of the time-parallel parallel full approximation scheme in space and time (PFASST) with a parallel multigrid method (PMG) in space, resulting in a mesh-based solver for the three-dimensional heat equation with a uniqu ely high degree of efficient concurrency. Parallel scaling tests are reported on the Cray XE6 machine Monte Rosa on up to 16,384 cores and on the IBM Blue Gene/Q system JUQUEEN on up to 65,536 cores. The efficacy of the combined spatial- and temporal parallelization is shown by demonstrating that using PFASST in addition to PMG significantly extends the strong-scaling limit. Implications of using spatial coarsening strategies in PFASSTs multi-level hierarchy in large-scale parallel simulations are discussed.
62 - Michiel de Bondt 2017
We show that Boolean matrix multiplication, computed as a sum of products of column vectors with row vectors, is essentially the same as Warshalls algorithm for computing the transitive closure matrix of a graph from its adjacency matrix. Warshalls algorithm can be generalized to Floyds algorithm for computing the distance matrix of a graph with weighted edges. We will generalize Boolean matrices in the same way, keeping matrix multiplication essentially equivalent to the Floyd-Warshall algorithm. This way, we get matrices over a semiring, which are similar to the so-called funny matrices. We discuss our implementation of operations on Boolean matrices and on their generalization, which make use of vector instructions.
Matrix computations, especially iterative PDE solving (and the sparse matrix vector multiplication subproblem within) using conjugate gradient algorithm, and LU/Cholesky decomposition for solving system of linear equations, form the kernel of many ap plications, such as circuit simulators, computational fluid dynamics or structural analysis etc. The problem of designing approaches for parallelizing these computations, to get good speedups as much as possible as per Amdahls law, has been continuously researched upon. In this paper, we discuss approaches based on the use of finite projective geometry graphs for these two problems. For the problem of conjugate gradient algorithm, the approach looks at an alternative data distribution based on projective-geometry concepts. It is proved that this data distribution is an optimal data distribution for scheduling the main problem of dense matrix-vector multiplication. For the problem of parallel LU/Cholesky decomposition of general matrices, the approach is motivated by the recently published scheme for interconnects of distributed systems, perfect difference networks. We find that projective-geometry based graphs indeed offer an exciting way of parallelizing these computations, and in fact many others. Moreover, their applications ranges from architectural ones (interconnect choice) to algorithmic ones (data distributions).
82 - Ehsan Rohani , Gwan Choi , Mi Lu 2017
In this paper we introduce the algorithm and the fixed point hardware to calculate the normalized singular value decomposition of a non-symmetric matrices using Givens fast (approximate) rotations. This algorithm only uses the basic combinational log ic modules such as adders, multiplexers, encoders, Barrel shifters (B-shifters), and comparators and does not use any lookup table. This method in fact combines the iterative properties of singular value decomposition method and CORDIC method in one single iteration. The introduced architecture is a systolic architecture that uses two different types of processors, diagonal and non-diagonal processors. The diagonal processor calculates, transmits and applies the horizontal and vertical rotations, while the non-diagonal processor uses a fully combinational architecture to receive, and apply the rotations. The diagonal processor uses priority encoders, Barrel shifters, and comparators to calculate the rotation angles. Both processors use a series of adders to apply the rotation angles. The design presented in this work provides $2.83sim649$ times better energy per matrix performance compared to the state of the art designs. This performance achieved without the employment of pipelining; a better performance advantage is expected to be achieved employing pipelining.
This paper describes the algorithms, features and implementation of PyDEC, a Python library for computations related to the discretization of exterior calculus. PyDEC facilitates inquiry into both physical problems on manifolds as well as purely topo logical problems on abstract complexes. We describe efficient algorithms for constructing the operators and objects that arise in discrete exterior calculus, lowest order finite element exterior calculus and in related topological problems. Our algorithms are formulated in terms of high-level matrix operations which extend to arbitrary dimension. As a result, our implementations map well to the facilities of numerical libraries such as NumPy and SciPy. The availability of such libraries makes Python suitable for prototyping numerical methods. We demonstrate how PyDEC is used to solve physical and topological problems through several concise examples.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا