No Arabic abstract
Recently, great efforts have been dedicated to researches on the management of large scale graph based data such as WWW, social networks, biological networks. In the study of graph based data management, node disjoint subgraph homeomorphism relation between graphs is more suitable than (sub)graph isomorphism in many cases, especially in those cases that node skipping and node mismatching are allowed. However, no efficient node disjoint subgraph homeomorphism determination (ndSHD) algorithms have been available. In this paper, we propose two computationally efficient ndSHD algorithms based on state spaces searching with backtracking, which employ many heuristics to prune the search spaces. Experimental results on synthetic data sets show that the proposed algorithms are efficient, require relative little time in most of the testing cases, can scale to large or dense graphs, and can accommodate to more complex fuzzy matching cases.
Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph $G$ over $n$ vertices, $m$ edges and $T$ triangles, our first main result is an algorithm that, with high probability, outputs a $(1+varepsilon)$-approximation to $T$, with optimal round and space complexity provided any $S geq max{(sqrt m, n^2/m)}$ space per machine, assuming $T=Omega(sqrt{m/n})$. Our second main result is an $tilde{O}_{delta}(log log n)$-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity $alpha$ of the input graph. The space per machine is $O(n^{delta})$ for any constant $delta$, and the total space is $O(malpha)$, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting $k$-cliques for any constant $k$, with the same round complexity and total space $O(malpha^{k-2})$. Alternatively, allowing $O(alpha^2)$ space per machine, the total space requirement reduces to $O(nalpha^2)$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$, can be implemented in the MPC model in $tilde{O}_{delta}(sqrt{log n})$ rounds, $O(n^{delta})$ space per machine and $O(malpha^3)$ total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model.
We study the classical Node-Disjoint Paths (NDP) problem: given an $n$-vertex graph $G$ and a collection $M={(s_1,t_1),ldots,(s_k,t_k)}$ of pairs of vertices of $G$ called demand pairs, find a maximum-cardinality set of node-disjoint paths connecting the demand pairs. NDP is one of the most basic routing problems, that has been studied extensively. Despite this, there are still wide gaps in our understanding of its approximability: the best currently known upper bound of $O(sqrt n)$ on its approximation ratio is achieved via a simple greedy algorithm, while the best current negative result shows that the problem does not have a better than $Omega(log^{1/2-delta}n)$-approximation for any constant $delta$, under standard complexity assumptions. Even for planar graphs no better approximation algorithms are known, and to the best of our knowledge, the best negative bound is APX-hardness. Perhaps the biggest obstacle to obtaining better approximation algorithms for NDP is that most currently known approximation algorithms for this type of problems rely on the standard multicommodity flow relaxation, whose integrality gap is $Omega(sqrt n)$ for NDP, even in planar graphs. In this paper, we break the barrier of $O(sqrt n)$ on the approximability of the NDP problem in planar graphs and obtain an $tilde O(n^{9/19})$-approximation. We introduce a new linear programming relaxation of the problem, and a number of new techniques, that we hope will be helpful in designing more powerful algorithms for this and related problems.
We study the classical NP-hard problems of finding maximum-size subsets from given sets of $k$ terminal pairs that can be routed via edge-disjoint paths (MaxEDP) or node-disjoint paths (MaxNDP) in a given graph. The approximability of MaxEDP/NDP is currently not well understood; the best known lower bound is $Omega(log^{1/2-epsilon}{n})$, assuming NP$~ otsubseteq~$ZPTIME$(n^{mathrm{poly}log n})$. This constitutes a significant gap to the best known approximation upper bound of $O(sqrt{n})$ due to Chekuri et al. (2006) and closing this gap is currently one of the big open problems in approximation algorithms. In their seminal paper, Raghavan and Thompson (Combinatorica, 1987) introduce the technique of randomized rounding for LPs; their technique gives an $O(1)$-approximation when edges (or nodes) may be used by $O(frac{log n}{loglog n})$ paths. In this paper, we strengthen the above fundamental results. We provide new bounds formulated in terms of the feedback vertex set number $r$ of a graph, which measures its vertex deletion distance to a forest. In particular, we obtain the following. * For MaxEDP, we give an $O(sqrt{r}cdot log^{1.5}{kr})$-approximation algorithm. As $rleq n$, up to logarithmic factors, our result strengthens the best known ratio $O(sqrt{n})$ due to Chekuri et al. * Further, we show how to route $Omega(mathrm{OPT})$ pairs with congestion $O(frac{log{kr}}{loglog{kr}})$, strengthening the bound obtained by the classic approach of Raghavan and Thompson. * For MaxNDP, we give an algorithm that gives the optimal answer in time $(k+r)^{O(r)}cdot n$. If $r$ is at most triple-exponential in $k$, this improves the best known algorithm for MaxNDP with parameter $k$, by Kawarabayashi and Wollan (STOC 2010). We complement these positive results by proving that MaxEDP is NP-hard even for $r=1$, and MaxNDP is W$[1]$-hard for parameter $r$.
Let $A$ and $B$ be two point sets in the plane of sizes $r$ and $n$ respectively (assume $r leq n$), and let $k$ be a parameter. A matching between $A$ and $B$ is a family of pairs in $A times B$ so that any point of $A cup B$ appears in at most one pair. Given two positive integers $p$ and $q$, we define the cost of matching $M$ to be $c(M) = sum_{(a, b) in M}|{a-b}|_p^q$ where $|{cdot}|_p$ is the $L_p$-norm. The geometric partial matching problem asks to find the minimum-cost size-$k$ matching between $A$ and $B$. We present efficient algorithms for geometric partial matching problem that work for any powers of $L_p$-norm matching objective: An exact algorithm that runs in $O((n + k^2) {mathop{mathrm{polylog}}} n)$ time, and a $(1 + varepsilon)$-approximation algorithm that runs in $O((n + ksqrt{k}) {mathop{mathrm{polylog}}} n cdot logvarepsilon^{-1})$ time. Both algorithms are based on the primal-dual flow augmentation scheme; the main improvements involve using dynamic data structures to achieve efficient flow augmentations. With similar techniques, we give an exact algorithm for the planar transportation problem running in $O(min{n^2, rn^{3/2}} {mathop{mathrm{polylog}}} n)$ time.
Schietgat, Ramon and Bruynooghe proposed a polynomial-time algorithm for computing a maximum common subgraph under the block-and-bridge preserving subgraph isomorphism (BBP-MCS) for outerplanar graphs. We show that the article contains the following errors: (i) The running time of the presented approach is claimed to be $mathcal{O}(n^{2.5})$ for two graphs of order $n$. We show that the algorithm of the authors allows no better bound than $mathcal{O}(n^4)$ when using state-of-the-art general purpose methods to solve the matching instances arising as subproblems. This is even true for the special case, where both input graphs are trees. (ii) The article suggests that the dissimilarity measure derived from BBP-MCS is a metric. We show that the triangle inequality is not always satisfied and, hence, it is not a metric. Therefore, the dissimilarity measure should not be used in combination with techniques that rely on or exploit the triangle inequality in any way. Where possible, we give hints on techniques that are suitable to improve the algorithm.