No Arabic abstract
Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such as social network analysis and query over the knowledge graph. Due to the inherent hardness, its performance is often a bottleneck in various real-world applications. Therefore, we address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture, such as massive parallelism and memory hierarchy. Existing GPU-based solutions adopt a two-step output scheme, performing the same join process twice in order to write intermediate results concurrently. They also lack GPU architecture-aware optimizations that allow scaling to large graphs. In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI. Different from existing edge join-based GPU solutions, we propose a Prealloc-Combine strategy based on the vertex-oriented framework, which avoids joining-twice in existing solutions. Also, a GPU-friendly data structure (called PCSR) is proposed to represent an edge-labeled graph. Extensive experiments on both synthetic and real graphs show that GSI outperforms the state-of-the-art algorithms by up to several orders of magnitude and has good scalability with graph size scaling to hundreds of millions of edges.
Subgraph isomorphism is a well-known NP-hard problem which is widely used in many applications, such as social network analysis and knowledge graph query. Its performance is often limited by the inherent hardness. Several insightful works have been done since 2012, mainly optimizing pruning rules and matching orders to accelerate enumerating all isomorphic subgraphs. Nevertheless, their correctness and performance are not well studied. First, different languages are used in implementation with different compilation flags. Second, experiments are not done on the same platform and the same datasets. Third, some ideas of different works are even complementary. Last but not least, there exist errors when applying some algorithms. In this paper, we address these problems by re-implementing seven representative subgraph isomorphism algorithms as well as their improv
Let $F$ be a connected graph with $ell$ vertices. The existence of a subgraph isomorphic to $F$ can be defined in first-order logic with quantifier depth no better than $ell$, simply because no first-order formula of smaller quantifier depth can distinguish between the complete graphs $K_ell$ and $K_{ell-1}$. We show that, for some $F$, the existence of an $F$ subgraph in emph{sufficiently large} connected graphs is definable with quantifier depth $ell-3$. On the other hand, this is never possible with quantifier depth better than $ell/2$. If we, however, consider definitions over connected graphs with sufficiently large treewidth, the quantifier depth can for some $F$ be arbitrarily small comparing to $ell$ but never smaller than the treewidth of $F$. Moreover, the definitions over highly connected graphs require quantifier depth strictly more than the density of $F$. Finally, we determine the exact values of these descriptive complexity parameters for all connected pattern graphs $F$ on 4 vertices.
Given a graph $F$, let $I(F)$ be the class of graphs containing $F$ as an induced subgraph. Let $W[F]$ denote the minimum $k$ such that $I(F)$ is definable in $k$-variable first-order logic. The recognition problem of $I(F)$, known as Induced Subgraph Isomorphism (for the pattern graph $F$), is solvable in time $O(n^{W[F]})$. Motivated by this fact, we are interested in determining or estimating the value of $W[F]$. Using Olarius characterization of paw-free graphs, we show that $I(K_3+e)$ is definable by a first-order sentence of quantifier depth 3, where $K_3+e$ denotes the paw graph. This provides an example of a graph $F$ with $W[F]$ strictly less than the number of vertices in $F$. On the other hand, we prove that $W[F]=4$ for all $F$ on 4 vertices except the paw graph and its complement. If $F$ is a graph on $t$ vertices, we prove a general lower bound $W[F]>(1/2-o(1))t$, where the function in the little-o notation approaches 0 as $t$ inreases. This bound holds true even for a related parameter $W^*[F]le W[F]$, which is defined as the minimum $k$ such that $I(F)$ is definable in the infinitary logic $L^k_{inftyomega}$. We show that $W^*[F]$ can be strictly less than $W[F]$. Specifically, $W^*[P_4]=3$ for $P_4$ being the path graph on 4 vertices. Using the lower bound for $W[F]$, we also obtain a succintness result for existential monadic second-order logic: A usage of just one monadic quantifier sometimes reduces the first-order quantifier depth at a super-recursive rate.
Let $v(F)$ denote the number of vertices in a fixed connected pattern graph $F$. We show an infinite family of patterns $F$ such that the existence of a subgraph isomorphic to $F$ is expressible by a first-order sentence of quantifier depth $frac23,v(F)+1$, assuming that the host graph is sufficiently large and connected. On the other hand, this is impossible for any $F$ with using less than $frac23,v(F)-2$ first-order variables.
The Subgraph Matching (SM) problem consists of finding all the embeddings of a given small graph, called the query, into a large graph, called the target. The SM problem has been widely studied for simple graphs, i.e. graphs where there is exactly one edge between two nodes and nodes have single labels, but few approaches have been devised for labeled multigraphs, i.e. graphs having possibly multiple labels on nodes in which pair of nodes may have multiple labeled edges between them. Here we present MultiRI, a novel algorithm for the Sub-Multigraph Matching (SMM) problem, i.e. subgraph matching in labeled multigraphs. MultiRI improves on the state-of-the-art by computing compatibility domains and symmetry breaking conditions on query nodes to filter the search space of possible solutions. Empirically, we show that MultiRI outperforms the state-of-the-art method for the SMM problem in both synthetic and real graphs, with a multiplicative speedup between five and ten for large graphs, by using a limited amount of memory.