No Arabic abstract
The Subgraph Matching (SM) problem consists of finding all the embeddings of a given small graph, called the query, into a large graph, called the target. The SM problem has been widely studied for simple graphs, i.e. graphs where there is exactly one edge between two nodes and nodes have single labels, but few approaches have been devised for labeled multigraphs, i.e. graphs having possibly multiple labels on nodes in which pair of nodes may have multiple labeled edges between them. Here we present MultiRI, a novel algorithm for the Sub-Multigraph Matching (SMM) problem, i.e. subgraph matching in labeled multigraphs. MultiRI improves on the state-of-the-art by computing compatibility domains and symmetry breaking conditions on query nodes to filter the search space of possible solutions. Empirically, we show that MultiRI outperforms the state-of-the-art method for the SMM problem in both synthetic and real graphs, with a multiplicative speedup between five and ten for large graphs, by using a limited amount of memory.
Subgraph matching is a compute-intensive problem that asks to enumerate all the isomorphic embeddings of a query graph within a data graph. This problem is generally solved with backtracking, which recursively evolves every possible partial embedding until it becomes an isomorphic embedding or is found unable to become it. While existing methods reduce the search space by analyzing graph structures before starting the backtracking, it is often ineffective for complex graphs. In this paper, we propose an efficient algorithm for subgraph matching that performs on-the-fly pruning during the backtracking. Our main idea is to `learn from failure. That is, our algorithm generates failure patterns when a partial embedding is found unable to become an isomorphic embedding. Then, in the subsequent process of the backtracking, our algorithm prunes partial embeddings matched with a failure pattern. This pruning does not change the result because failure patterns are designed to represent the conditions that never yield an isomorphic embedding. Additionally, we introduce an efficient representation of failure patterns for constant-time pattern matching. The experimental results show that our method improves the performance by up to 10000 times than existing methods.
In this paper, we propose a GPU-efficient subgraph isomorphism algorithm using the Gunrock graph analytic framework, GSM (Gunrock Subgraph Matching), to compute graph matching on GPUs. In contrast to previous approaches on the CPU which are based on depth-first traversal, GSM is BFS-based: possible matches are explored simultaneously in a breadth-first strategy. The advantage of using BFS-based traversal is that we can leverage the massively parallel processing capabilities of the GPU. The disadvantage is the generation of more intermediate results. We propose several optimization techniques to cope with the problem. Our implementation follows a filtering-and-verification strategy. While most previous work on GPUs requires one-/two-step joining, we use one-step verification to decide the candidates in current frontier of nodes. Our implementation has a speedup up to 4x over previous GPU state-of-the-art implementation.
Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such as social network analysis and query over the knowledge graph. Due to the inherent hardness, its performance is often a bottleneck in various real-world applications. Therefore, we address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture, such as massive parallelism and memory hierarchy. Existing GPU-based solutions adopt a two-step output scheme, performing the same join process twice in order to write intermediate results concurrently. They also lack GPU architecture-aware optimizations that allow scaling to large graphs. In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI. Different from existing edge join-based GPU solutions, we propose a Prealloc-Combine strategy based on the vertex-oriented framework, which avoids joining-twice in existing solutions. Also, a GPU-friendly data structure (called PCSR) is proposed to represent an edge-labeled graph. Extensive experiments on both synthetic and real graphs show that GSI outperforms the state-of-the-art algorithms by up to several orders of magnitude and has good scalability with graph size scaling to hundreds of millions of edges.
Subgraph isomorphism is a well-known NP-hard problem which is widely used in many applications, such as social network analysis and knowledge graph query. Its performance is often limited by the inherent hardness. Several insightful works have been done since 2012, mainly optimizing pruning rules and matching orders to accelerate enumerating all isomorphic subgraphs. Nevertheless, their correctness and performance are not well studied. First, different languages are used in implementation with different compilation flags. Second, experiments are not done on the same platform and the same datasets. Third, some ideas of different works are even complementary. Last but not least, there exist errors when applying some algorithms. In this paper, we address these problems by re-implementing seven representative subgraph isomorphism algorithms as well as their improv
Within a large database G containing graphs with labeled nodes and directed, multi-edges; how can we detect the anomalous graphs? Most existing work are designed for plain (unlabeled) and/or simple (unweighted) graphs. We introduce CODETECT, the first approach that addresses the anomaly detection task for graph databases with such complex nature. To this end, it identifies a small representative set S of structural patterns (i.e., node-labeled network motifs) that losslessly compress database G as concisely as possible. Graphs that do not compress well are flagged as anomalous. CODETECT exhibits two novel building blocks: (i) a motif-based lossless graph encoding scheme, and (ii) fast memory-efficient search algorithms for S. We show the effectiveness of CODETECT on transaction graph databases from three different corporations, where existing baselines adjusted for the task fall behind significantly, across different types of anomalies and performance metrics.