As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can only be exploited at the level of the BLAS operations.
A fast implicit QR algorithm for eigenvalue computation of low rank corrections of unitary matrices is adjusted to work with matrix pencils arising from polynomial zerofinding problems . The modified QZ algorithm computes the generalized eigenvalues of certain NxN rank structured matrix pencils using O(N^2) ops and O(N) memory storage. Numerical experiments and comparisons confirm the effectiveness and the stability of the proposed method.
Based on the two observations that diverse applications perform better on different multicore architectures, and that different phases of an application may have vastly different resource requirements, Pal et al. proposed a novel reconfigurable hardware approach for executing multithreaded programs. Instead of mapping a concurrent program to a fixed architecture, the architecture adaptively reconfigures itself to meet the applications concurrency and communication requirements, yielding significant improvements in performance. Based on our earlier abstract operational framework for multicore execution with hierarchical memory structures, we describe execution of multithreaded programs on reconfigurable architectures that support a variety of clustered configurations. Such reconfiguration may not preserve the semantics of programs due to the possible introduction of race conditions arising from concurrent accesses to shared memory by threads running on the different cores. We present an intuitive partial ordering notion on the cluster configurations, and show that the semantics of multithreaded programs is always preserved for reconfigurations upward in that ordering, whereas semantics preservation for arbitrary reconfigurations can be guaranteed for well-synchronised programs. We further show that a simple approximate notion of efficiency of execution on the different configurations can be obtained using the notion of amortised bisimulations, and extend it to dynamic reconfiguration.
Some fast algorithms for computing the eigenvalues of a block companion matrix $A = U + XY^H$, where $Uin mathbb C^{ntimes n}$ is unitary block circulant and $X, Y inmathbb{C}^{n times k}$, have recently appeared in the literature. Most of these algorithms rely on the decomposition of $A$ as product of scalar companion matrices which turns into a factored representation of the Hessenberg reduction of $A$. In this paper we generalize the approach to encompass Hessenberg matrices of the form $A=U + XY^H$ where $U$ is a general unitary matrix. A remarkable case is $U$ unitary diagonal which makes possible to deal with interpolation techniques for rootfinding problems and nonlinear eigenvalue problems. Our extension exploits the properties of a larger matrix $hat A$ obtained by a certain embedding of the Hessenberg reduction of $A$ suitable to maintain its structural properties. We show that $hat A$ can be factored as product of lower and upper unitary Hessenberg matrices possibly perturbed in the first $k$ rows, and, moreover, such a data-sparse representation is well suited for the design of fast eigensolvers based on the QR/QZ iteration. The resulting algorithm is fast and backward stable.
Spectral computations of infinite-dimensional operators are notoriously difficult, yet ubiquitous in the sciences. Indeed, despite more than half a century of research, it is still unknown which classes of operators allow for computation of spectra and eigenvectors with convergence rates and error control. Recent progress in classifying the difficulty of spectral problems into complexity hierarchies has revealed that the most difficult spectral problems are so hard that one needs three limits in the computation, and no convergence rates nor error control is possible. This begs the question: which classes of operators allow for computations with convergence rates and error control? In this paper we address this basic question, and the algorithm used is an infinite-dimensional version of the QR algorithm. Indeed, we generalise the QR algorithm to infinite-dimensional operators. We prove that not only is the algorithm executable on a finite machine, but one can also recover the extremal parts of the spectrum and corresponding eigenvectors, with convergence rates and error control. This allows for new classification results in the hierarchy of computational problems that existing algorithms have not been able to capture. The algorithm and convergence theorems are demonstrated on a wealth of examples with comparisons to standard approaches (that are notorious for providing false solutions).We also find that in some cases the IQR algorithm performs better than predicted by theory and make conjectures for future study.
New real structure-preserving decompositions are introduced to develop fast and robust algorithms for the (right) eigenproblem of general quaternion matrices. Under the orthogonally JRS-symplectic transformations, the Francis JRS-QR step and the JRS-QR algorithm are firstly proposed for JRS-symmetric matrices and then applied to calculate the Schur forms of quaternion matrices. A novel quaternion Givens matrix is defined and utilized to compute the QR factorization of quaternion Hessenberg matrices. An implicit double shift quaternion QR algorithm is presented with a technique for automatically choosing shifts and within real operations. Numerical experiments are provided to demonstrate the efficiency and accuracy of newly proposed algorithms.