ﻻ يوجد ملخص باللغة العربية
The Variable Block Row (VBR) format is an influential blocked sparse matrix format designed for matrices with shared sparsity structure between adjacent rows and columns. VBR groups adjacent rows and columns, storing the resulting blocks that contain nonzeros in a dense format. This reduces the memory footprint and enables optimizations such as register blocking and instruction-level parallelism. Existing approaches use heuristics to determine which rows and columns should be grouped together. We show that finding the optimal grouping of rows and columns for VBR is NP-hard under several reasonable cost models. In light of this finding, we propose a 1-dimensional variant of VBR, called 1D-VBR, which achieves better performance than VBR by only grouping rows. We describe detailed cost models for runtime and memory consumption. Then, we describe a linear time dynamic programming solution for optimally grouping the rows for 1D-VBR format. We extend our algorithm to produce a heuristic VBR partitioner which alternates between optimally partitioning rows and columns, assuming the columns or rows to be fixed, respectively. Our alternating heuristic produces VBR matrices with the smallest memory footprint of any partitioner we tested.
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for so
Low-rank tensors are an established framework for high-dimensional least-squares problems. We propose to extend this framework by including the concept of block-sparsity. In the context of polynomial regression each sparsity pattern corresponds to so
We study the optimization version of the set partition problem (where the difference between the partition sums are minimized), which has numerous applications in decision theory literature. While the set partitioning problem is NP-hard and requires
We present the first work-optimal polylogarithmic-depth parallel algorithm for the minimum cut problem on non-sparse graphs. For $mgeq n^{1+epsilon}$ for any constant $epsilon>0$, our algorithm requires $O(m log n)$ work and $O(log^3 n)$ depth and su
Let svec = (s_1,...,s_m) and tvec = (t_1,...,t_n) be vectors of nonnegative integer-valued functions of m,n with equal sum S = sum_{i=1}^m s_i = sum_{j=1}^n t_j. Let M(svec,tvec) be the number of m*n matrices with nonnegative integer entries such tha