No Arabic abstract
The Variable Block Row (VBR) format is an influential blocked sparse matrix format designed for matrices with shared sparsity structure between adjacent rows and columns. VBR groups adjacent rows and columns, storing the resulting blocks that contain nonzeros in a dense format. This reduces the memory footprint and enables optimizations such as register blocking and instruction-level parallelism. Existing approaches use heuristics to determine which rows and columns should be grouped together. We show that finding the optimal grouping of rows and columns for VBR is NP-hard under several reasonable cost models. In light of this finding, we propose a 1-dimensional variant of VBR, called 1D-VBR, which achieves better performance than VBR by only grouping rows. We describe detailed cost models for runtime and memory consumption. Then, we describe a linear time dynamic programming solution for optimally grouping the rows for 1D-VBR format. We extend our algorithm to produce a heuristic VBR partitioner which alternates between optimally partitioning rows and columns, assuming the columns or rows to be fixed, respectively. Our alternating heuristic produces VBR matrices with the smallest memory footprint of any partitioner we tested.
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for some function g. However, the dual tree-depth of a constraint matrix is not preserved by row operations, i.e., a given integer program can be equivalent to another with a smaller dual tree-depth, and thus does not reflect its geometric structure. We prove that the minimum dual tree-depth of a row-equivalent matrix is equal to the branch-depth of the matroid defined by the columns of the matrix. We design a fixed parameter algorithm for computing branch-depth of matroids represented over a finite field and a fixed parameter algorithm for computing a row-equivalent matrix with minimum dual tree-depth. Finally, we use these results to obtain an algorithm for integer programming running in time g(d*,D)poly(n) where d* is the branch-depth of the constraint matrix; the branch-depth cannot be replaced by the more permissive notion of branch-width.
Low-rank tensors are an established framework for high-dimensional least-squares problems. We propose to extend this framework by including the concept of block-sparsity. In the context of polynomial regression each sparsity pattern corresponds to some subspace of homogeneous multivariate polynomials. This allows us to adapt the ansatz space to align better with known sample complexity results. The resulting method is tested in numerical experiments and demonstrates improved computational resource utilization and sample efficiency.
We study the optimization version of the set partition problem (where the difference between the partition sums are minimized), which has numerous applications in decision theory literature. While the set partitioning problem is NP-hard and requires exponential complexity to solve (i.e., intractable); we formulate a weaker version of this NP-hard problem, where the goal is to find a locally optimal solution. We show that our proposed algorithms can find a locally optimal solution in near linear time. Our algorithms require neither positive nor integer elements in the input set, hence, they are more widely applicable.
We present the first work-optimal polylogarithmic-depth parallel algorithm for the minimum cut problem on non-sparse graphs. For $mgeq n^{1+epsilon}$ for any constant $epsilon>0$, our algorithm requires $O(m log n)$ work and $O(log^3 n)$ depth and succeeds with high probability. Its work matches the best $O(m log n)$ runtime for sequential algorithms [MN STOC 2020, GMW SOSA 2021]. This improves the previous best work by Geissmann and Gianinazzi [SPAA 2018] by $O(log^3 n)$ factor, while matching the depth of their algorithm. To do this, we design a work-efficient approximation algorithm and parallelize the recent sequential algorithms [MN STOC 2020; GMW SOSA 2021] that exploit a connection between 2-respecting minimum cuts and 2-dimensional orthogonal range searching.
Let svec = (s_1,...,s_m) and tvec = (t_1,...,t_n) be vectors of nonnegative integer-valued functions of m,n with equal sum S = sum_{i=1}^m s_i = sum_{j=1}^n t_j. Let M(svec,tvec) be the number of m*n matrices with nonnegative integer entries such that the i-th row has row sum s_i and the j-th column has column sum t_j for all i,j. Such matrices occur in many different settings, an important example being the contingency tables (also called frequency tables) important in statistics. Define s=max_i s_i and t=max_j t_j. Previous work has established the asymptotic value of M(svec,tvec) as m,ntoinfty with s and t bounded (various authors independently, 1971-1974), and when svec,tvec are constant vectors with m/n,n/m,s/n >= c/log n for sufficiently large (Canfield and McKay, 2007). In this paper we extend the sparse range to the case st=o(S^(2/3)). The proof in part follows a previous asymptotic enumeration of 0-1 matrices under the same conditions (Greenhill, McKay and Wang, 2006). We also generalise the enumeration to matrices over any subset of the nonnegative integers that includes 0 and 1.