ترغب بنشر مسار تعليمي؟ اضغط هنا

Estimating Properties of Social Networks via Random Walk considering Private Nodes

171   0   0.0 ( 0 )
 نشر من قبل Kazuki Nakajima
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Accurately analyzing graph properties of social networks is a challenging task because of access limitations to the graph data. To address this challenge, several algorithms to obtain unbiased estimates of properties from few samples via a random walk have been studied. However, existing algorithms do not consider private nodes who hide their neighbors in real social networks, leading to some practical problems. Here we design random walk-based algorithms to accurately estimate properties without any problems caused by private nodes. First, we design a random walk-based sampling algorithm that comprises the neighbor selection to obtain samples having the Markov property and the calculation of weights for each sample to correct the sampling bias. Further, for two graph property estimators, we propose the weighting methods to reduce not only the sampling bias but also estimation errors due to private nodes. The proposed algorithms improve the estimation accuracy of the existing algorithms by up to 92.6% on real-world datasets.



قيم البحث

اقرأ أيضاً

Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the necessity of UNI but still enjoy the similar benefits as RWwJ. We observe that many graphs under study, called target graphs, do not exist in isolation. In many situations, a target graph is related to an auxiliary graph and a bipartite graph, and they together form a better connected {em two-layered network structure}. This new viewpoint brings extra benefits to graph sampling: if directly sampling a target graph is difficult, we can sample it indirectly with the assistance of the other two graphs. We propose a series of new graph sampling techniques by exploiting such a two-layered network structure to estimate target graph characteristics. Experiments conducted on both synthetic and real-world networks demonstrate the effectiveness and usefulness of these new techniques.
Complex systems, abstractly represented as networks, are ubiquitous in everyday life. Analyzing and understanding these systems requires, among others, tools for community detection. As no single best community detection algorithm can exist, robustne ss across a wide variety of problem settings is desirable. In this work, we present Synwalk, a random walk-based community detection method. Synwalk builds upon a solid theoretical basis and detects communities by synthesizing the random walk induced by the given network from a class of candidate random walks. We thoroughly validate the effectiveness of our approach on synthetic and empirical networks, respectively, and compare Synwalks performance with the performance of Infomap and Walktrap. Our results indicate that Synwalk performs robustly on networks with varying mixing parameters and degree distributions. We outperform Infomap on networks with high mixing parameter, and Infomap and Walktrap on networks with many small communities and low average degree. Our work has a potential to inspire further development of community detection via synthesis of random walks and we provide concrete ideas for future research.
In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifically, unlike traditional random walk which wait for the convergence of sampling distribution to a predetermined target distribution - a waiting process that incurs a high query cost - we develop WALK-ESTIMATE, which starts with a much shorter random walk, and then proactively estimate the sampling probability for the node taken before using acceptance-rejection sampling to adjust the sampling probability to the predetermined target distribution. We present a novel backward random walk technique which provides provably unbiased estimations for the sampling probability, and demonstrate the superiority of WALK-ESTIMATE over traditional random walks through theoretical analysis and extensive experiments over real world online social networks.
Graphlets are induced subgraph patterns that are crucial to the understanding of the structure and function of a large network. A lot of efforts have been devoted to calculating graphlet statistics where random walk based approaches are commonly used to access restricted graphs through the available application programming interfaces (APIs). However, most of them merely consider individual networks while overlooking the strong coupling between different networks. In this paper, we estimate the graphlet concentration in multi-layer networks with real-world applications. An inter-layer edge connects two nodes in different layers if they belong to the same person. The access to a multi-layer network is restrictive in the sense that the upper layer allows random walk sampling, whereas the nodes of lower layers can be accessed only though the inter-layer edges and only support random node or edge sampling. To cope with this new challenge, we define a suit of two-layer graphlets and propose a novel random walk sampling algorithm to estimate the proportion of all the 3-node graphlets. An analytical bound on the sampling steps is proved to guarantee the convergence of our unbiased estimator. We further generalize our algorithm to explore the tradeoff between the estimated accuracies of different graphlets when the sample size is split on different layers. Experimental evaluation on real-world and synthetic multi-layer networks demonstrate the accuracy and high efficiency of our unbiased estimators.
Existing socio-psychological studies suggest that users of a social network form their opinions relying on the opinions of their neighbors. According to DeGroot opinion formation model, one value of particular importance is the asymptotic consensus v alue---the sum of user opinions weighted by the users eigenvector centralities. This value plays the role of an attractor for the opinions in the network and is a lucrative target for external influence. However, since any potentially malicious control of the opinion distribution in a social network is clearly undesirable, it is important to design methods to prevent the external attempts to strategically change the asymptotic consensus value. In this work, we assume that the adversary wants to maximize the asymptotic consensus value by altering the opinions of some users in a network; we, then, state DIVER---an NP-hard problem of disabling such external influence attempts by strategically adding a limited number of edges to the network. Relying on the theory of Markov chains, we provide perturbation analysis that shows how eigenvector centrality and, hence, DIVERs objective function change in response to an edges addition to the network. The latter leads to the design of a pseudo-linear-time heuristic for DIVER, whose computation relies on efficient estimation of mean first passage times in a Markov chain. We confirm our theoretical findings in experiments.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا