No Arabic abstract
We design new serial and parallel approximation algorithms for computing a maximum weight $b$-matching in an edge-weighted graph with a submodular objective function. This problem is NP-hard; the new algorithms have approximation ratio $1/3$, and are relaxations of the Greedy algorithm that rely only on local information in the graph, making them parallelizable. We have designed and implemented Local Lazy Greedy algorithms for both serial and parallel computers. We have applied the approximate submodular $b$-matching algorithm to assign tasks to processors in the computation of Fock matrices in quantum chemistry on parallel computers. The assignment seeks to reduce the run time by balancing the computational load on the processors and bounding the number of messages that each processor sends. We show that the new assignment of tasks to processors provides a four fold speedup over the currently used assignment in the NWChemEx software on $8000$ processors on the Summit supercomputer at Oak Ridge National Lab.
The problem of maximizing a non-negative submodular function was introduced by Feige, Mirrokni, and Vondrak [FOCS07] who provided a deterministic local-search based algorithm that guarantees an approximation ratio of $frac 1 3$, as well as a randomized $frac 2 5$-approximation algorithm. An extensive line of research followed and various algorithms with improving approximation ratios were developed, all of them are randomized. Finally, Buchbinder et al. [FOCS12] presented a randomized $frac 1 2$-approximation algorithm, which is the best possible. This paper gives the first deterministic algorithm for maximizing a non-negative submodular function that achieves an approximation ratio better than $frac 1 3$. The approximation ratio of our algorithm is $frac 2 5$. Our algorithm is based on recursive composition of solutions obtained by the local search algorithm of Feige et al. We show that the $frac 2 5$ approximation ratio can be guaranteed when the recursion depth is $2$, and leave open the question of whether the approximation ratio improves as the recursion depth increases.
This paper describes a simple greedy D-approximation algorithm for any covering problem whose objective function is submodular and non-decreasing, and whose feasible region can be expressed as the intersection of arbitrary (closed upwards) covering constraints, each of which constrains at most D variables of the problem. (A simple example is Vertex Cover, with D = 2.) The algorithm generalizes previous approximation algorithms for fundamental covering problems and online paging and caching problems.
In a minimum cost submodular cover problem (MinSMC), given a monotone non-decreasing submodular function $fcolon 2^V rightarrow mathbb{Z}^+$, a cost function $c: Vrightarrow mathbb R^{+}$, an integer $kleq f(V)$, the goal is to find a subset $Asubseteq V$ with the minimum cost such that $f(A)geq k$. MinSMC has a lot of applications in machine learning and data mining. In this paper, we design a parallel algorithm for MinSMC which obtains a solution with approximation ratio at most $frac{H(min{Delta,k})}{1-5varepsilon}$ with probability $1-3varepsilon$ in $O(frac{log mlog nlog^2 mn}{varepsilon^4})$ rounds, where $Delta=max_{vin V}f(v)$, $H(cdot)$ is the Hamornic number, $n=f(V)$, $m=|V|$ and $varepsilon$ is a constant in $(0,frac{1}{5})$. This is the first paper obtaining a parallel algorithm for the weighted version of the MinSMC problem with an approximation ratio arbitrarily close to $H(min{Delta,k})$.
In this work, we give a new parallel algorithm for the problem of maximizing a non-monotone diminishing returns submodular function subject to a cardinality constraint. For any desired accuracy $epsilon$, our algorithm achieves a $1/e - epsilon$ approximation using $O(log{n} log(1/epsilon) / epsilon^3)$ parallel rounds of function evaluations. The approximation guarantee nearly matches the best approximation guarantee known for the problem in the sequential setting and the number of parallel rounds is nearly-optimal for any constant $epsilon$. Previous algorithms achieve worse approximation guarantees using $Omega(log^2{n})$ parallel rounds. Our experimental evaluation suggests that our algorithm obtains solutions whose objective value nearly matches the value obtained by the state of the art sequential algorithms, and it outperforms previous parallel algorithms in number of parallel rounds, iterations, and solution quality.
For over a decade now we have been witnessing the success of {em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in $O(log{n})$ rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has $n^{1+Omega(1)}$ memory, this problem can also be solved $2$-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly $O(log{n})$ rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above $O(log n)$ round complexity bound even in the case of {em slightly sublinear} memory per machine. In fact, our improvement here is {em almost exponential}: we are able to deliver a $(2+epsilon)$-approximation to maximum matching, for any fixed constant $epsilon>0$, in $O((log log n)^2)$ rounds.