ﻻ يوجد ملخص باللغة العربية
As the ratio between the rate of computation and rate with which data can be retrieved from various layers of memory continues to deteriorate, a question arises: Will the current best algorithms for computing matrix-matrix multiplication on future CPUs continue to be (near) optimal? This paper provides compelling analytical and empirical evidence that the answer is no. The analytical results guide us to a new family of algorithms of which the current state-of-the-art Gotos algorithm is but one member. The empirical results, on architectures that were custom built to reduce the amount of bandwidth to main memory, show that under different circumstances, different and particular members of the family become more superior. Thus, this family will likely start playing a prominent role going forward.
Matrix multiplication (GEMM) is a core operation to numerous scientific applications. Traditional implementations of Strassen-like fast matrix multiplication (FMM) algorithms often do not perform well except for very large matrix sizes, due to the in
We approach the problem of implementing mixed-datatype support within the general matrix multiplication (GEMM) operation of the BLIS framework, whereby each matrix operand A, B, and C may be stored as single- or double-precision real or complex value
We propose several new schedules for Strassen-Winograds matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices, or by using
Quaternion symmetry is ubiquitous in the physical sciences. As such, much work has been afforded over the years to the development of efficient schemes to exploit this symmetry using real and complex linear algebra. Recent years have also seen many a
We present algorithms for real and complex dot product and matrix multiplication in arbitrary-precision floating-point and ball arithmetic. A low-overhead dot product is implemented on the level of GMP limb arrays; it is about twice as fast as previo