No Arabic abstract
Subgraph matching is a compute-intensive problem that asks to enumerate all the isomorphic embeddings of a query graph within a data graph. This problem is generally solved with backtracking, which recursively evolves every possible partial embedding until it becomes an isomorphic embedding or is found unable to become it. While existing methods reduce the search space by analyzing graph structures before starting the backtracking, it is often ineffective for complex graphs. In this paper, we propose an efficient algorithm for subgraph matching that performs on-the-fly pruning during the backtracking. Our main idea is to `learn from failure. That is, our algorithm generates failure patterns when a partial embedding is found unable to become an isomorphic embedding. Then, in the subsequent process of the backtracking, our algorithm prunes partial embeddings matched with a failure pattern. This pruning does not change the result because failure patterns are designed to represent the conditions that never yield an isomorphic embedding. Additionally, we introduce an efficient representation of failure patterns for constant-time pattern matching. The experimental results show that our method improves the performance by up to 10000 times than existing methods.
The Subgraph Matching (SM) problem consists of finding all the embeddings of a given small graph, called the query, into a large graph, called the target. The SM problem has been widely studied for simple graphs, i.e. graphs where there is exactly one edge between two nodes and nodes have single labels, but few approaches have been devised for labeled multigraphs, i.e. graphs having possibly multiple labels on nodes in which pair of nodes may have multiple labeled edges between them. Here we present MultiRI, a novel algorithm for the Sub-Multigraph Matching (SMM) problem, i.e. subgraph matching in labeled multigraphs. MultiRI improves on the state-of-the-art by computing compatibility domains and symmetry breaking conditions on query nodes to filter the search space of possible solutions. Empirically, we show that MultiRI outperforms the state-of-the-art method for the SMM problem in both synthetic and real graphs, with a multiplicative speedup between five and ten for large graphs, by using a limited amount of memory.
In this paper, we propose a GPU-efficient subgraph isomorphism algorithm using the Gunrock graph analytic framework, GSM (Gunrock Subgraph Matching), to compute graph matching on GPUs. In contrast to previous approaches on the CPU which are based on depth-first traversal, GSM is BFS-based: possible matches are explored simultaneously in a breadth-first strategy. The advantage of using BFS-based traversal is that we can leverage the massively parallel processing capabilities of the GPU. The disadvantage is the generation of more intermediate results. We propose several optimization techniques to cope with the problem. Our implementation follows a filtering-and-verification strategy. While most previous work on GPUs requires one-/two-step joining, we use one-step verification to decide the candidates in current frontier of nodes. Our implementation has a speedup up to 4x over previous GPU state-of-the-art implementation.
The growing popularity of dynamic applications such as social networks provides a promising way to detect valuable information in real time. Efficient analysis over high-speed data from dynamic applications is of great significance. Data from these dynamic applications can be easily modeled as streaming graph. In this paper, we study the subgraph (isomorphism) search over streaming graph data that obeys timing order constraints over the occurrence of edges in the stream. We propose a data structure and algorithm to efficiently answer subgraph search and introduce optimizations to greatly reduce the space cost, and propose concurrency management to improve system throughput. Extensive experiments on real network traffic data and synthetic social streaming data confirms the efficiency and effectiveness of our solution.
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.
Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such as social network analysis and query over the knowledge graph. Due to the inherent hardness, its performance is often a bottleneck in various real-world applications. Therefore, we address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture, such as massive parallelism and memory hierarchy. Existing GPU-based solutions adopt a two-step output scheme, performing the same join process twice in order to write intermediate results concurrently. They also lack GPU architecture-aware optimizations that allow scaling to large graphs. In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI. Different from existing edge join-based GPU solutions, we propose a Prealloc-Combine strategy based on the vertex-oriented framework, which avoids joining-twice in existing solutions. Also, a GPU-friendly data structure (called PCSR) is proposed to represent an edge-labeled graph. Extensive experiments on both synthetic and real graphs show that GSI outperforms the state-of-the-art algorithms by up to several orders of magnitude and has good scalability with graph size scaling to hundreds of millions of edges.