Sampling Graphlets of Multi-layer Networks: A Restricted Random Walk Approach


الملخص بالإنكليزية

Graphlets are induced subgraph patterns that are crucial to the understanding of the structure and function of a large network. A lot of efforts have been devoted to calculating graphlet statistics where random walk based approaches are commonly used to access restricted graphs through the available application programming interfaces (APIs). However, most of them merely consider individual networks while overlooking the strong coupling between different networks. In this paper, we estimate the graphlet concentration in multi-layer networks with real-world applications. An inter-layer edge connects two nodes in different layers if they belong to the same person. The access to a multi-layer network is restrictive in the sense that the upper layer allows random walk sampling, whereas the nodes of lower layers can be accessed only though the inter-layer edges and only support random node or edge sampling. To cope with this new challenge, we define a suit of two-layer graphlets and propose a novel random walk sampling algorithm to estimate the proportion of all the 3-node graphlets. An analytical bound on the sampling steps is proved to guarantee the convergence of our unbiased estimator. We further generalize our algorithm to explore the tradeoff between the estimated accuracies of different graphlets when the sample size is split on different layers. Experimental evaluation on real-world and synthetic multi-layer networks demonstrate the accuracy and high efficiency of our unbiased estimators.

تحميل البحث