أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Lei Zou

240 - Li Zeng , Lei Zou , M. Tamer Ozsu 2019

Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such as social network analysis and query over the knowledge graph. Due to the inherent hardness, its performance is often a bottleneck in various real-wor ld applications. Therefore, we address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture, such as massive parallelism and memory hierarchy. Existing GPU-based solutions adopt a two-step output scheme, performing the same join process twice in order to write intermediate results concurrently. They also lack GPU architecture-aware optimizations that allow scaling to large graphs. In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI. Different from existing edge join-based GPU solutions, we propose a Prealloc-Combine strategy based on the vertex-oriented framework, which avoids joining-twice in existing solutions. Also, a GPU-friendly data structure (called PCSR) is proposed to represent an edge-labeled graph. Extensive experiments on both synthetic and real graphs show that GSI outperforms the state-of-the-art algorithms by up to several orders of magnitude and has good scalability with graph size scaling to hundreds of millions of edges.

قواعد البيانات

Accelerating Partial Evaluation in Distributed SPARQL Query Evaluation

114 - Peng Peng , Lei Zou , Runyu Guan 2019

Partial evaluation has recently been used for processing SPARQL queries over a large resource description framework (RDF) graph in a distributed environment. However, the previous approach is inefficient when dealing with complex queries. In this stu dy, we further improve the partial evaluation and assembly framework for answering SPARQL queries over a distributed RDF graph, while providing performance guarantees. Our key idea is to explore the intrinsic structural characteristics of partial matches to filter out irrelevant partial results, while providing performance guarantees on a network trace (data shipment) or the computational cost (response time). We also propose an efficient assembly algorithm to utilize the characteristics of partial matches to merge them and form final results. To improve the efficiency of finding partial matches further, we propose an optimization that communicates variables candidates among sites to avoid redundant computations. In addition, although our approach is partitioning-tolerant, different partitioning strategies result in different performances, and we evaluate different partitioning strategies for our approach. Experiments over both real and synthetic RDF datasets confirm the superiority of our approach.

النظم الموزعة والتوازية والحوسبة العنقودية

Dolha - an Efficient and Exact Data Structure for Streaming Graphs

357 - Fan Zhang , Lei Zou , Li Zeng 2019

A streaming graph is a graph formed by a sequence of incoming edges with time stamps. Unlike static graphs, the streaming graph is highly dynamic and time related. In the real world, the high volume and velocity streaming graphs such as internet traf fic data, social network communication data and financial transfer data are bringing challenges to the classic graph data structures. We present a new data structure: double orthogonal list in hash table (Dolha) which is a high speed and high memory efficiency graph structure applicable to streaming graph. Dolha has constant time cost for single edge and near linear space cost that we can contain billions of edges information in memory size and process an incoming edge in nanoseconds. Dolha also has linear time cost for neighborhood queries, which allow it to support most algorithms in graphs without extra cost. We also present a persistent structure based on Dolha that has the ability to handle the sliding window update and time related queries.

بنى وهياكل البيانات والخوارزميات

Fast and Accurate Graph Stream Summarization

420 - Xiangyang Gou , Lei Zou , Chenxingyu Zhao 2018

A graph stream is a continuous sequence of data items, in which each item indicates an edge, including its two endpoints and edge weight. It forms a dynamic graph that changes with every item in the stream. Graph streams play important roles in cyber security, social networks, cloud troubleshooting systems and other fields. Due to the vast volume and high update speed of graph streams, traditional data structures for graph storage such as the adjacency matrix and the adjacency list are no longer sufficient. However, prior art of graph stream summarization, like CM sketches, gSketches, TCM and gMatrix, either supports limited kinds of queries or suffers from poor accuracy of query results. In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has the linear space cost (O(|E|), E is the edge set of the graph) and the constant update time complexity (O(1)) and supports all kinds of queries over graph streams with the controllable errors. Both theoretical analysis and experiment results confirm the superiority of our solution with regard to the time/space complexity and query results precision compared with the state-of-the-art.

بنى وهياكل البيانات والخوارزميات

Time Constrained Continuous Subgraph Search over Streaming Graphs

304 - Youhuan Li , Lei Zou , M. Tamer Ozsu 2018

The growing popularity of dynamic applications such as social networks provides a promising way to detect valuable information in real time. Efficient analysis over high-speed data from dynamic applications is of great significance. Data from these d ynamic applications can be easily modeled as streaming graph. In this paper, we study the subgraph (isomorphism) search over streaming graph data that obeys timing order constraints over the occurrence of edges in the stream. We propose a data structure and algorithm to efficiently answer subgraph search and introduce optimizations to greatly reduce the space cost, and propose concurrency management to improve system throughput. Extensive experiments on real network traffic data and synthetic social streaming data confirms the efficiency and effectiveness of our solution.

قواعد البيانات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد