ﻻ يوجد ملخص باللغة العربية
Graph pattern mining (GPM) is used in diverse application areas including social network analysis, bioinformatics, and chemical engineering. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a wide variety of GPM algorithms at the cost of increased programming complexity. Moreover, existing systems lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications. We present Sandslash, an in-memory Graph Pattern Mining (GPM) framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that needs only a specification of the GPM problem, and it implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash on shared-memory machines using five GPM applications and a wide range of large real-world graphs. Experimental results demonstrate that applications written using Sandslash high-level or low-level API outperforms state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.
There is growing interest in graph pattern mining (GPM) problems such as motif counting. GPM systems have been developed to provide unified interfaces for programming algorithms for these problems and for running them on parallel systems. However, ex
FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivot
In this paper we study predictive pattern mining problems where the goal is to construct a predictive model based on a subset of predictive patterns in the database. Our main contribution is to introduce a novel method called safe pattern pruning (SP
Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distri
The industry and academia have proposed many distributed graph processing systems. However, the existing systems are not friendly enough for users like data analysts and algorithm engineers. On the one hand, the programing models and interfaces diffe