ﻻ يوجد ملخص باللغة العربية
We describe an efficient implementation of a hierarchy of algorithms for multiplication of dense matrices over the field with two elements (GF(2)). In particular we present our implementation -- in the M4RI library -- of Strassen-Winograd matrix multiplication and the Method of the Four Russians multiplication (M4RM) and compare it against other available implementations. Good performance is demonstrated on on AMDs Opteron and particulary good performance on Intels Core 2 Duo. The open-source M4RI library is available stand-alone as well as part of the Sage mathematics software. In machine terms, addition in GF(2) is logical-XOR, and multiplication is logical-AND, thus a machine word of 64-bits allows one to operate on 64 elements of GF(2) in parallel: at most one CPU cycle for 64 parallel additions or multiplications. As such, element-wise operations over GF(2) are relatively cheap. In fact, in this paper, we conclude that the actual bottlenecks are memory reads and writes and issues of data locality. We present our empirical findings in relation to minimizing these and give an analysis thereof.
In this work we describe an efficient implementation of a hierarchy of algorithms for the decomposition of dense matrices over the field with two elements (GF(2)). Matrix decomposition is an essential building block for solving dense systems of linea
We propose several new schedules for Strassen-Winograds matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices, or by using
We introduce a consistent and efficient method to construct self-dual codes over $GF(q)$ with symmetric generator matrices from a self-dual code over $GF(q)$ of smaller length where $q equiv 1 pmod 4$. Using this method, we improve the best-known min
Four recursive constructions of permutation polynomials over $gf(q^2)$ with those over $gf(q)$ are developed and applied to a few famous classes of permutation polynomials. They produce infinitely many new permutation polynomials over $gf(q^{2^ell})$
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPU