ﻻ يوجد ملخص باللغة العربية
Core decomposition is a fundamental graph problem with a large number of applications. Most existing approaches for core decomposition assume that the graph is kept in memory of a machine. Nevertheless, many real-world graphs are big and may not reside in memory. In the literature, there is only one work for I/O efficient core decomposition that avoids loading the whole graph in memory. However, this approach is not scalable to handle big graphs because it cannot bound the memory size and may load most parts of the graph in memory. In addition, this approach can hardly handle graph updates. In this paper, we study I/O efficient core decomposition following a semi-external model, which only allows node information to be loaded in memory. This model works well in many web-scale graphs. We propose a semi-external algorithm and two optimized algorithms for I/O efficient core decomposition using very simple structures and data access model. To handle dynamic graph updates, we show that our algorithm can be naturally extended to handle edge deletion. We also propose an I/O efficient core maintenance algorithm to handle edge insertion, and an improved algorithm to further reduce I/O and CPU cost by investigating some new graph properties. We conduct extensive experiments on 12 real large graphs. Our optimal algorithm significantly outperform the existing I/O efficient algorithm in terms of both processing time and memory consumption. In many memory-resident graphs, our algorithms for both core decomposition and maintenance can even outperform the in-memory algorithm due to the simple structures and data access model used. Our algorithms are very scalable to handle web-scale graphs. As an example, we are the first to handle a web graph with 978.5 million nodes and 42.6 billion edges using less than 4.2 GB memory.
A challenge for data imputation is the lack of knowledge. In this paper, we attempt to address this challenge by involving extra knowledge from web. To achieve high-performance web-based imputation, we use the dependency, i.e.FDs and CFDs, to impute
With the magnitude of graph-structured data continually increasing, graph processing systems that can scale-out and scale-up are needed to handle extreme-scale datasets. While existing distributed out-of-core solutions have made it possible, they suf
The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (branches) that are really used in a given analysis need to be read from storage. Its u
Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the hig
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Descript