Parallel Tiled QR Factorization for Multicore Architectures

156 0 0.0 ( 0 )

Download Cite

Added by Julien Langou

Publication date 2007

fields

and research's language is English

Authors Alfredo Buttari (Department of Electrical Engineering - Computern Science - University Tennessee

Numerical Analysis

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can only be exploited at the level of the BLAS operations.

rate research

Implicit QR for Companion-like Pencils

308 - Paola Boito , Yuli Eidelman , Luca Gemignani 2014

A fast implicit QR algorithm for eigenvalue computation of low rank corrections of unitary matrices is adjusted to work with matrix pencils arising from polynomial zerofinding problems . The modified QZ algorithm computes the generalized eigenvalues of certain NxN rank structured matrix pencils using O(N^2) ops and O(N) memory storage. Numerical experiments and comparisons confirm the effectiveness and the stability of the proposed method.

Numerical Analysis

Program Execution on Reconfigurable Multicore Architectures

93 - Sanjiva Prasad 2016

Based on the two observations that diverse applications perform better on different multicore architectures, and that different phases of an application may have vastly different resource requirements, Pal et al. proposed a novel reconfigurable hardware approach for executing multithreaded programs. Instead of mapping a concurrent program to a fixed architecture, the architecture adaptively reconfigures itself to meet the applications concurrency and communication requirements, yielding significant improvements in performance. Based on our earlier abstract operational framework for multicore execution with hierarchical memory structures, we describe execution of multithreaded programs on reconfigurable architectures that support a variety of clustered configurations. Such reconfiguration may not preserve the semantics of programs due to the possible introduction of race conditions arising from concurrent accesses to shared memory by threads running on the different cores. We present an intuitive partial ordering notion on the cluster configurations, and show that the semantics of multithreaded programs is always preserved for reconfigurations upward in that ordering, whereas semantics preservation for arbitrary reconfigurations can be guaranteed for well-synchronised programs. We further show that a simple approximate notion of efficiency of execution on the different configurations can be obtained using the notion of amortised bisimulations, and extend it to dynamic reconfiguration.

Programming Languages Distributed Parallel and Cluster Computing Performance

Fast QR iterations for unitary plus low rank matrices

127 - Roberto Bevilacqua , Gianna M. Del Corso , Luca Gemignani 2018

Some fast algorithms for computing the eigenvalues of a block companion matrix $A = U + XY^H$, where $Uin mathbb C^{ntimes n}$ is unitary block circulant and $X, Y inmathbb{C}^{n times k}$, have recently appeared in the literature. Most of these algorithms rely on the decomposition of $A$ as product of scalar companion matrices which turns into a factored representation of the Hessenberg reduction of $A$. In this paper we generalize the approach to encompass Hessenberg matrices of the form $A=U + XY^H$ where $U$ is a general unitary matrix. A remarkable case is $U$ unitary diagonal which makes possible to deal with interpolation techniques for rootfinding problems and nonlinear eigenvalue problems. Our extension exploits the properties of a larger matrix $hat A$ obtained by a certain embedding of the Hessenberg reduction of $A$ suitable to maintain its structural properties. We show that $hat A$ can be factored as product of lower and upper unitary Hessenberg matrices possibly perturbed in the first $k$ rows, and, moreover, such a data-sparse representation is well suited for the design of fast eigensolvers based on the QR/QZ iteration. The resulting algorithm is fast and backward stable.

Numerical Analysis Numerical Analysis

On the infinite-dimensional QR algorithm

66 - Matthew J. Colbrook , Anders C. Hansen 2020

Spectral computations of infinite-dimensional operators are notoriously difficult, yet ubiquitous in the sciences. Indeed, despite more than half a century of research, it is still unknown which classes of operators allow for computation of spectra and eigenvectors with convergence rates and error control. Recent progress in classifying the difficulty of spectral problems into complexity hierarchies has revealed that the most difficult spectral problems are so hard that one needs three limits in the computation, and no convergence rates nor error control is possible. This begs the question: which classes of operators allow for computations with convergence rates and error control? In this paper we address this basic question, and the algorithm used is an infinite-dimensional version of the QR algorithm. Indeed, we generalise the QR algorithm to infinite-dimensional operators. We prove that not only is the algorithm executable on a finite machine, but one can also recover the extremal parts of the spectrum and corresponding eigenvectors, with convergence rates and error control. This allows for new classification results in the hierarchy of computational problems that existing algorithms have not been able to capture. The algorithm and convergence theorems are demonstrated on a wealth of examples with comparisons to standard approaches (that are notorious for providing false solutions).We also find that in some cases the IQR algorithm performs better than predicted by theory and make conjectures for future study.

Numerical Analysis Numerical Analysis Mathematical Physics

A New Real Structure-preserving Quaternion QR Algorithm

65 - Zhigang Jia , Musheng Wei , Meixiang Zhao 2017

New real structure-preserving decompositions are introduced to develop fast and robust algorithms for the (right) eigenproblem of general quaternion matrices. Under the orthogonally JRS-symplectic transformations, the Francis JRS-QR step and the JRS-QR algorithm are firstly proposed for JRS-symmetric matrices and then applied to calculate the Schur forms of quaternion matrices. A novel quaternion Givens matrix is defined and utilized to compute the QR factorization of quaternion Hessenberg matrices. An implicit double shift quaternion QR algorithm is presented with a technique for automatically choosing shifts and within real operations. Numerical experiments are provided to demonstrate the efficiency and accuracy of newly proposed algorithms.

Numerical Analysis