We show that the edit distance between two strings of length $n$ can be computed within a factor of $f(epsilon)$ in $n^{1+epsilon}$ time as long as the edit distance is at least $n^{1-delta}$ for some $delta(epsilon) > 0$.
Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic progr
amming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within an approximation factor $text{poly}(log n)$. In this paper, we provide an algorithm with running time $tilde{O}(n^{2-2/7})$ that approximates the edit distance within a constant factor.
Diffusion is a fundamental graph procedure and has been a basic building block in a wide range of theoretical and empirical applications such as graph partitioning and semi-supervised learning on graphs. In this paper, we study computationally effici
ent diffusion primitives beyond random walk. We design an $widetilde{O}(m)$-time randomized algorithm for the $ell_2$-norm flow diffusion problem, a recently proposed diffusion model based on network flow with demonstrated graph clustering related applications both in theory and in practice. Examples include finding locally-biased low conductance cuts. Using a known connection between the optimal dual solution of the flow diffusion problem and the local cut structure, our algorithm gives an alternative approach for finding such cuts in nearly linear time. From a technical point of view, our algorithm contributes a novel way of dealing with inequality constraints in graph optimization problems. It adapts the high-level algorithmic framework of nearly linear time Laplacian system solvers, but requires several new tools: vertex elimination under constraints, a new family of graph ultra-sparsifiers, and accelerated proximal gradient methods with inexact proximal mapping computation.
We consider the problem of efficiently scheduling jobs with precedence constraints on a set of identical machines in the presence of a uniform communication delay. In this setting, if two precedence-constrained jobs $u$ and $v$, with ($u prec v$), ar
e scheduled on different machines, then $v$ must start at least $rho$ time units after $u$ completes. The scheduling objective is to minimize makespan, i.e. the total time between when the first job starts and the last job completes. The focus of this paper is to provide an efficient approximation algorithm with near-linear running time. We build on the algorithm of Lepere and Rapine [STACS 2002] for this problem to give an $Oleft(frac{ln rho}{ln ln rho} right)$-approximation algorithm that runs in $tilde{O}(|V| + |E|)$ time.
We consider the problem of center-based clustering in low-dimensional Euclidean spaces under the perturbation stability assumption. An instance is $alpha$-stable if the underlying optimal clustering continues to remain optimal even when all pairwise
distances are arbitrarily perturbed by a factor of at most $alpha$. Our main contribution is in presenting efficient exact algorithms for $alpha$-stable clustering instances whose running times depend near-linearly on the size of the data set when $alpha ge 2 + sqrt{3}$. For $k$-center and $k$-means problems, our algorithms also achieve polynomial dependence on the number of clusters, $k$, when $alpha geq 2 + sqrt{3} + epsilon$ for any constant $epsilon > 0$ in any fixed dimension. For $k$-median, our algorithms have polynomial dependence on $k$ for $alpha > 5$ in any fixed dimension; and for $alpha geq 2 + sqrt{3}$ in two dimensions. Our algorithms are simple, and only require applying techniques such as local search or dynamic programming to a suitably modified metric space, combined with careful choice of data structures.
We present the first near optimal approximation schemes for the maximum weighted (uncapacitated or capacitated) $b$--matching problems for non-bipartite graphs that run in time (near) linear in the number of edges. For any $delta>3/sqrt{n}$ the
algorithm produces a $(1-delta)$ approximation in $O(m poly(delta^{-1},log n))$ time. We provide fractional solutions for the standard linear programming formulations for these problems and subsequently also provide (near) linear time approximation schemes for rounding the fractional solutions. Through these problems as a vehicle, we also present several ideas in the context of solving linear programs approximately using fast primal-dual algorithms. First, even though the dual of these problems have exponentially many variables and an efficient exact computation of dual weights is infeasible, we show that we can efficiently compute and use a sparse approximation of the dual weights using a combination of (i) adding perturbation to the constraints of the polytope and (ii) amplification followed by thresholding of the dual weights. Second, we show that approximation algorithms can be used to reduce the width of the formulation, and faster convergence.