No Arabic abstract
Subgraph isomorphism is a well-known NP-hard problem which is widely used in many applications, such as social network analysis and knowledge graph query. Its performance is often limited by the inherent hardness. Several insightful works have been done since 2012, mainly optimizing pruning rules and matching orders to accelerate enumerating all isomorphic subgraphs. Nevertheless, their correctness and performance are not well studied. First, different languages are used in implementation with different compilation flags. Second, experiments are not done on the same platform and the same datasets. Third, some ideas of different works are even complementary. Last but not least, there exist errors when applying some algorithms. In this paper, we address these problems by re-implementing seven representative subgraph isomorphism algorithms as well as their improv
Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such as social network analysis and query over the knowledge graph. Due to the inherent hardness, its performance is often a bottleneck in various real-world applications. Therefore, we address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture, such as massive parallelism and memory hierarchy. Existing GPU-based solutions adopt a two-step output scheme, performing the same join process twice in order to write intermediate results concurrently. They also lack GPU architecture-aware optimizations that allow scaling to large graphs. In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI. Different from existing edge join-based GPU solutions, we propose a Prealloc-Combine strategy based on the vertex-oriented framework, which avoids joining-twice in existing solutions. Also, a GPU-friendly data structure (called PCSR) is proposed to represent an edge-labeled graph. Extensive experiments on both synthetic and real graphs show that GSI outperforms the state-of-the-art algorithms by up to several orders of magnitude and has good scalability with graph size scaling to hundreds of millions of edges.
Given a graph $F$, let $I(F)$ be the class of graphs containing $F$ as an induced subgraph. Let $W[F]$ denote the minimum $k$ such that $I(F)$ is definable in $k$-variable first-order logic. The recognition problem of $I(F)$, known as Induced Subgraph Isomorphism (for the pattern graph $F$), is solvable in time $O(n^{W[F]})$. Motivated by this fact, we are interested in determining or estimating the value of $W[F]$. Using Olarius characterization of paw-free graphs, we show that $I(K_3+e)$ is definable by a first-order sentence of quantifier depth 3, where $K_3+e$ denotes the paw graph. This provides an example of a graph $F$ with $W[F]$ strictly less than the number of vertices in $F$. On the other hand, we prove that $W[F]=4$ for all $F$ on 4 vertices except the paw graph and its complement. If $F$ is a graph on $t$ vertices, we prove a general lower bound $W[F]>(1/2-o(1))t$, where the function in the little-o notation approaches 0 as $t$ inreases. This bound holds true even for a related parameter $W^*[F]le W[F]$, which is defined as the minimum $k$ such that $I(F)$ is definable in the infinitary logic $L^k_{inftyomega}$. We show that $W^*[F]$ can be strictly less than $W[F]$. Specifically, $W^*[P_4]=3$ for $P_4$ being the path graph on 4 vertices. Using the lower bound for $W[F]$, we also obtain a succintness result for existential monadic second-order logic: A usage of just one monadic quantifier sometimes reduces the first-order quantifier depth at a super-recursive rate.
Let $v(F)$ denote the number of vertices in a fixed connected pattern graph $F$. We show an infinite family of patterns $F$ such that the existence of a subgraph isomorphic to $F$ is expressible by a first-order sentence of quantifier depth $frac23,v(F)+1$, assuming that the host graph is sufficiently large and connected. On the other hand, this is impossible for any $F$ with using less than $frac23,v(F)-2$ first-order variables.
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.
In the Maximum Common Induced Subgraph problem (henceforth MCIS), given two graphs $G_1$ and $G_2$, one looks for a graph with the maximum number of vertices being both an induced subgraph of $G_1$ and $G_2$. MCIS is among the most studied classical NP-hard problems. It remains NP-hard on many graph classes including forests. In this paper, we study the parameterized complexity of MCIS. As a generalization of textsc{Clique}, it is W[1]-hard parameterized by the size of the solution. Being NP-hard even on forests, most structural parameterizations are intractable. One has to go as far as parameterizing by the size of the minimum vertex cover to get some tractability. Indeed, when parameterized by $k := text{vc}(G_1)+text{vc}(G_2)$ the sum of the vertex cover number of the two input graphs, the problem was shown to be fixed-parameter tractable, with an algorithm running in time $2^{O(k log k)}$. We complement this result by showing that, unless the ETH fails, it cannot be solved in time $2^{o(k log k)}$. This kind of tight lower bound has been shown for a few problems and parameters but, to the best of our knowledge, not for the vertex cover number. We also show that MCIS does not have a polynomial kernel when parameterized by $k$, unless $NP subseteq mathsf{coNP}/poly$. Finally, we study MCIS and its connected variant MCCIS on some special graph classes and with respect to other structural parameters.