ComPAS: Community Preserving Sampling for Streaming Graphs


Abstract in English

In the era of big data, graph sampling is indispensable in many settings. Existing sampling methods are mostly designed for static graphs, and aim to preserve basic structural properties of the original graph (such as degree distribution, clustering coefficient etc.) in the sample. We argue that for any sampling method it is impossible to produce an universal representative sample which can preserve all the properties of the original graph; rather sampling should be application specific (such as preserving hubs - needed for information diffusion). Here we consider community detection as an application scenario. We propose ComPAS, a novel sampling strategy that unlike previous methods, is not only designed for streaming graphs (which is a more realistic representation of a real-world scenario) but also preserves the community structure of the original graph in the sample. Empirical results on both synthetic and different real-world graphs show that ComPAS is the best to preserve the underlying community structure with average performance reaching 73.2% of the most informed algorithm for static graphs.

Download