ترغب بنشر مسار تعليمي؟ اضغط هنا

As random walk is a powerful tool in many graph processing, mining and learning applications, this paper proposes an efficient in-memory random walk engine named ThunderRW. Compared with existing parallel systems on improving the performance of a sin gle graph operation, ThunderRW supports massive parallel random walks. The core design of ThunderRW is motivated by our profiling results: common RW algorithms have as high as 73.1% CPU pipeline slots stalled due to irregular memory access, which suffers significantly more memory stalls than the conventional graph workloads such as BFS and SSSP. To improve the memory efficiency, we first design a generic step-centric programming model named Gather-Move-Update to abstract different RW algorithms. Based on the programming model, we develop the step interleaving technique to hide memory access latency by switching the executions of different random walk queries. In our experiments, we use four representative RW algorithms including PPR, DeepWalk, Node2Vec and MetaPath to demonstrate the efficiency and programming flexibility of ThunderRW. Experimental results show that ThunderRW outperforms state-of-the-art approaches by an order of magnitude, and the step interleaving technique significantly reduces the CPU pipeline stall from 73.1% to 15.0%.
As large graph processing emerges, we observe a costly fork-processing pattern (FPP) that is common in many graph algorithms. The unique feature of the FPP is that it launches many independent queries from different source vertices on the same graph. For example, an algorithm in analyzing the network community profile can execute Personalized PageRanks that start from tens of thousands of source vertices at the same time. We study the efficiency of handling FPPs in state-of-the-art graph processing systems on multi-core architectures. We find that those systems suffer from severe cache miss penalty because of the irregular and uncoordinated memory accesses in processing FPPs. In this paper, we propose ForkGraph, a cache-efficient FPP processing system on multi-core architectures. To improve the cache reuse, we divide the graph into partitions each sized of LLC capacity, and the queries in an FPP are buffered and executed on the partition basis. We further develop efficient intra- and inter-partition execution strategies for efficiency. For intra-partition processing, since the graph partition fits into LLC, we propose to execute each graph query with efficient sequential algorithms (in contrast with parallel algorithms in existing parallel graph processing systems) and present an atomic-free query processing by consolidating contending operations to cache-resident graph partition. For inter-partition processing, we propose yielding and priority-based scheduling, to reduce redundant work in processing. Besides, we theoretically prove that ForkGraph performs the same amount of work, to within a constant factor, as the fastest known sequential algorithms in FPP queries processing, which is work efficient. Our evaluations on real-world graphs show that ForkGraph significantly outperforms state-of-the-art graph processing systems with two orders of magnitude speedups.
We study the hop-constrained s-t path enumeration (HcPE) problem, which takes a graph $G$, two distinct vertices $s,t$ and a hop constraint $k$ as input, and outputs all paths from $s$ to $t$ whose length is at most $k$. The state-of-the-art algorith ms suffer from severe performance issues caused by the costly pruning operations during enumeration for the workloads with the large search space. Consequently, these algorithms hardly meet the real-time constraints of many online applications. In this paper, we propose PathEnum, an efficient index-based algorithm towards real-time HcPE. For an input query, PathEnum first builds a light-weight index aiming to reduce the number of edges involved in the enumeration, and develops efficient index-based approaches for enumeration, one based on depth-first search and the other based on joins. We further develop a query optimizer based on a join-based cost model to optimize the search order. We conduct experiments with 15 real-world graphs. Our experiment results show that PathEnum outperforms the state-of-the-art approaches by orders of magnitude in terms of the query time, throughput and response time.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا