No Arabic abstract
Interior point algorithms for solving linear programs have been studied extensively for a long time [e.g. Karmarkar 1984; Lee, Sidford FOCS14; Cohen, Lee, Song STOC19]. For linear programs of the form $min_{Ax=b, x ge 0} c^top x$ with $n$ variables and $d$ constraints, the generic case $d = Omega(n)$ has recently been settled by Cohen, Lee and Song [STOC19]. Their algorithm can solve linear programs in $tilde O(n^omega log(n/delta))$ expected time, where $delta$ is the relative accuracy. This is essentially optimal as all known linear system solvers require up to $O(n^{omega})$ time for solving $Ax = b$. However, for the case of deterministic solvers, the best upper bound is Vaidyas 30 years old $O(n^{2.5} log(n/delta))$ bound [FOCS89]. In this paper we show that one can also settle the deterministic setting by derandomizing Cohen et al.s $tilde{O}(n^omega log(n/delta))$ time algorithm. This allows for a strict $tilde{O}(n^omega log(n/delta))$ time bound, instead of an expected one, and a simplified analysis, reducing the length of their proof of their central path method by roughly half. Derandomizing this algorithm was also an open question asked in Songs PhD Thesis. The main tool to achieve our result is a new data-structure that can maintain the solution to a linear system in subquadratic time. More accurately we are able to maintain $sqrt{U}A^top(AUA^top)^{-1}Asqrt{U}:v$ in subquadratic time under $ell_2$ multiplicative changes to the diagonal matrix $U$ and the vector $v$. This type of change is common for interior point algorithms. Previous algorithms [e.g. Vaidya STOC89; Lee, Sidford FOCS15; Cohen, Lee, Song STOC19] required $Omega(n^2)$ time for this task. [...]
In this note, we study the expander decomposition problem in a more general setting where the input graph has positively weighted edges and nonnegative demands on its vertices. We show how to extend the techniques of Chuzhoy et al. (FOCS 2020) to this wider setting, obtaining a deterministic algorithm for the problem in almost-linear time.
This paper introduces a new interior point method algorithm that solves semidefinite programming (SDP) with variable size $n times n$ and $m$ constraints in the (current) matrix multiplication time $m^{omega}$ when $m geq Omega(n^2)$. Our algorithm is optimal because even finding a feasible matrix that satisfies all the constraints requires solving an linear system in $m^{omega}$ time. Our work improves the state-of-the-art SDP solver [Jiang, Kathuria, Lee, Padmanabhan and Song, FOCS 2020], and it is the first result that SDP can be solved in the optimal running time. Our algorithm is based on two novel techniques: $bullet$ Maintaining the inverse of a Kronecker product using lazy updates. $bullet$ A general amortization scheme for positive semidefinite matrices.
We study the decremental All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. The input to the problem is an $n$-vertex $m$-edge graph $G$ with non-negative edge lengths, that undergoes a sequence of edge deletions. The goal is to support approximate shortest-path queries: given a pair $x,y$ of vertices of $G$, return a path $P$ connecting $x$ to $y$, whose length is within factor $alpha$ of the length of the shortest $x$-$y$ path, in time $tilde O(|E(P)|)$, where $alpha$ is the approximation factor of the algorithm. APSP is one of the most basic and extensively studied dynamic graph problems. A long line of work culminated in the algorithm of [Chechik, FOCS 2018] with near optimal guarantees for the oblivious-adversary setting. Unfortunately, adaptive-adversary setting is still poorly understood. For unweighted graphs, the algorithm of [Henzinger, Krinninger and Nanongkai, FOCS 13, SICOMP 16] achieves a $(1+epsilon)$-approximation with total update time $tilde O(mn/epsilon)$; the best current total update time of $n^{2.5+O(epsilon)}$ is achieved by the deterministic algorithm of [Chuzhoy, Saranurak, SODA21], with $2^{O(1/epsilon)}$-multiplicative and $2^{O(log^{3/4}n/epsilon)}$-additive approximation. To the best of our knowledge, for arbitrary non-negative edge weights, the fastest current adaptive-update algorithm has total update time $O(n^{3}log L/epsilon)$, achieving a $(1+epsilon)$-approximation. Here, L is the ratio of longest to shortest edge lengths. Our main result is a deterministic algorithm for decremental APSP in undirected edge-weighted graphs, that, for any $Omega(1/loglog m)leq epsilon< 1$, achieves approximation factor $(log m)^{2^{O(1/epsilon)}}$, with total update time $Oleft (m^{1+O(epsilon)}cdot (log m)^{O(1/epsilon^2)}cdot log Lright )$.
In the decremental single-source shortest paths problem, the goal is to maintain distances from a fixed source $s$ to every vertex $v$ in an $m$-edge graph undergoing edge deletions. In this paper, we conclude a long line of research on this problem by showing a near-optimal deterministic data structure that maintains $(1+epsilon)$-approximate distance estimates and runs in $m^{1+o(1)}$ total update time. Our result, in particular, removes the oblivious adversary assumption required by the previous breakthrough result by Henzinger et al. [FOCS14], which leads to our second result: the first almost-linear time algorithm for $(1-epsilon)$-approximate min-cost flow in undirected graphs where capacities and costs can be taken over edges and vertices. Previously, algorithms for max flow with vertex capacities, or min-cost flow with any capacities required super-linear time. Our result essentially completes the picture for approximate flow in undirected graphs. The key technique of the first result is a novel framework that allows us to treat low-diameter graphs like expanders. This allows us to harness expander properties while bypassing shortcomings of expander decomposition, which almost all previous expander-based algorithms needed to deal with. For the second result, we break the notorious flow-decomposition barrier from the multiplicative-weight-update framework using randomization.
Hierarchical matrices are space and time efficient representations of dense matrices that exploit the low rank structure of matrix blocks at different levels of granularity. The hierarchically low rank block partitioning produces representations that can be stored and operated on in near-linear complexity instead of the usual polynomial complexity of dense matrices. In this paper, we present high performance implementations of matrix vector multiplication and compression operations for the $mathcal{H}^2$ variant of hierarchical matrices on GPUs. This variant exploits, in addition to the hierarchical block partitioning, hierarchical bases for the block representations and results in a scheme that requires only $O(n)$ storage and $O(n)$ complexity for the mat-vec and compression kernels. These two operations are at the core of algebraic operations for hierarchical matrices, the mat-vec being a ubiquitous operation in numerical algorithms while compression/recompression represents a key building block for other algebraic operations, which require periodic recompression during execution. The difficulties in developing efficient GPU algorithms come primarily from the irregular tree data structures that underlie the hierarchical representations, and the key to performance is to recast the computations on flattened trees in ways that allow batched linear algebra operations to be performed. This requires marshaling the irregularly laid out data in a way that allows them to be used by the batched routines. Marshaling operations only involve pointer arithmetic with no data movement and as a result have minimal overhead. Our numerical results on covariance matrices from 2D and 3D problems from spatial statistics show the high efficiency our routines achieve---over 550GB/s for the bandwidth-limited mat-vec and over 850GFLOPS/s in sustained performance for the compression on the P100 Pascal GPU.