ترغب بنشر مسار تعليمي؟ اضغط هنا

Designing Strassens algorithm

186   0   0.0 ( 0 )
 نشر من قبل Joshua Grochow
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In 1969, Strassen shocked the world by showing that two n x n matrices could be multiplied in time asymptotically less than $O(n^3)$. While the recursive construction in his algorithm is very clear, the key gain was made by showing that 2 x 2 matrix multiplication could be performed with only 7 multiplications instead of 8. The latter construction was arrived at by a process of elimination and appears to come out of thin air. Here, we give the simplest and most transparent proof of Strassens algorithm that we are aware of, using only a simple unitary 2-design and a few easy lines of calculation. Moreover, using basic facts from the representation theory of finite groups, we use 2-designs coming from group orbits to generalize our construction to all n (although the resulting algorithms arent optimal for n at least 3).

قيم البحث

اقرأ أيضاً

We dispel with street wisdom regarding the practical implementation of Strassens algorithm for matrix-matrix multiplication (DGEMM). Conventional wisdom: it is only practical for very large matrices. Our implementation is practical for small matrices . Conventional wisdom: the matrices being multiplied should be relatively square. Our implementation is practical for rank-k updates, where k is relatively small (a shape of importance for libraries like LAPACK). Conventional wisdom: it inherently requires substantial workspace. Our implementation requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations. Conventional wisdom: a Strassen DGEMM interface must pass in workspace. Our implementation requires no such workspace and can be plug-compatible with the standard DGEMM interface. Conventional wisdom: it is hard to demonstrate speedup on multi-core architectures. Our implementation demonstrates speedup over conventional DGEMM even on an Intel(R) Xeon Phi(TM) coprocessor utilizing 240 threads. We show how a distributed memory matrix-matrix multiplication also benefits from these advances.
Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multi-dimensional generalization of matrix multiplication (GEMM). While Strassens algorithm for GEMM is well studied in theory and practice, ex tending it to accelerate TC has not been previously pursued. Thus, we believe this to be the first paper to demonstrate how one can in practice speed up tensor contraction with Strassens algorithm. By adopting a Block-Scatter-Matrix format, a novel matrix-centric tensor layout, we can conceptually view TC as GEMM for a general stride storage, with an implicit tensor-to-matrix transformation. This insight enables us to tailor a recent state-of-the-art implementation of Strassens algorithm to TC, avoiding explicit transpositions (permutations) and extra workspace, and reducing the overhead of memory movement that is incurred. Performance benefits are demonstrated with a performance model as well as in practice on modern single core, multicore, and distributed memory parallel architectures, achieving up to 1.3x speedup. The resulting implementations can serve as a drop-in replacement for various applications with significant speedup.
In 2013, Orlin proved that the max flow problem could be solved in $O(nm)$ time. His algorithm ran in $O(nm + m^{1.94})$ time, which was the fastest for graphs with fewer than $n^{1.06}$ arcs. If the graph was not sufficiently sparse, the fastest run ning time was an algorithm due to King, Rao, and Tarjan. We describe a new variant of the excess scaling algorithm for the max flow problem whose running time strictly dominates the running time of the algorithm by King et al. Moreover, for graphs in which $m = O(n log n)$, the running time of our algorithm dominates that of King et al. by a factor of $O(loglog n)$.
We show that Nederlofs algorithm [Information Processing Letters, 118 (2017), 15-16] for constructing a proof that the number of subsets summing to a particular integer equals a claimed quantity is flawed because: 1) its consistence is not kept; 2) the proposed recurrence formula is incorrect.
140 - Tapani Toivonen 2021
Many combinatorial optimization problems are often considered intractable to solve exactly or by approximation. An example of such problem is maximum clique which -- under standard assumptions in complexity theory -- cannot be solved in sub-exponenti al time or be approximated within polynomial factor efficiently. We show that if a polynomial time algorithm can query informative Gaussian priors from an expert $poly(n)$ times, then a class of combinatorial optimization problems can be solved efficiently in expectation up to a multiplicative factor $epsilon$ where $epsilon$ is arbitrary constant. While our proposed methods are merely theoretical, they cast new light on how to approach solving these problems that have been usually considered intractable.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا