ﻻ يوجد ملخص باللغة العربية
Estimating distributions of node characteristics (labels) such as number of connections or citizenship of users in a social network via edge and node sampling is a vital part of the study of complex networks. Due to its low cost, sampling via a random walk (RW) has been proposed as an attractive solution to this task. Most RW methods assume either that the network is undirected or that walkers can traverse edges regardless of their direction. Some RW methods have been designed for directed networks where edges coming into a node are not directly observable. In this work, we propose Directed Unbiased Frontier Sampling (DUFS), a sampling method based on a large number of coordinated walkers, each starting from a node chosen uniformly at random. It is applicable to directed networks with invisible incoming edges because it constructs, in real-time, an undirected graph consistent with the walkers trajectories, and due to the use of random jumps which prevent walkers from being trapped. DUFS generalizes previous RW methods and is suited for undirected networks and to directed networks regardless of in-edges visibility. We also propose an improved estimator of node label distributions that combines information from the initial walker locations with subsequent RW observations. We evaluate DUFS, compare it to other RW methods, investigate the impact of its parameters on estimation accuracy and provide practical guidelines for choosing them. In estimating out-degree distributions, DUFS yields significantly better estimates of the head of the distribution than other methods, while matching or exceeding estimation accuracy of the tail. Last, we show that DUFS outperforms uniform node sampling when estimating distributions of node labels of the top 10% largest degree nodes, even when sampling a node uniformly has the same cost as RW steps.
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random
How popular a topic or an opinion appears to be in a network can be very different from its actual popularity. For example, in an online network of a social media platform, the number of people who mention a topic in their posts---i.e., its global po
Identifying hierarchies and rankings of nodes in directed graphs is fundamental in many applications such as social network analysis, biology, economics, and finance. A recently proposed method identifies the hierarchy by finding the ordered partitio
In a graph, a community may be loosely defined as a group of nodes that are more closely connected to one another than to the rest of the graph. While there are a variety of metrics that can be used to specify the quality of a given community, one co
Exploring small connected and induced subgraph patterns (CIS patterns, or graphlets) has recently attracted considerable attention. Despite recent efforts on computing the number of instances a specific graphlet appears in a large graph (i.e., the to