ترغب بنشر مسار تعليمي؟ اضغط هنا

Block Models and Personalized PageRank

69   0   0.0 ( 0 )
 نشر من قبل Johan Ugander
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods though the seed set expansion problem: given a subset $S$ of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of landing probabilities of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model, and find, surprisingly, that under reasonable assumptions the gradient is asymptotically equivalent to personalized PageRank for a specific choice of the PageRank parameter $alpha$ that depends on the block model parameters. This connection provides a novel formal motivation for the success of personalized PageRank in seed set expansion and node ranking generally. We use this connection to propose more advanced techniques incorporating higher moments of landing probabilities; our advanced methods exhibit greatly improved performance despite being simple linear classification rules, and are even competitive with belief propagation.



قيم البحث

اقرأ أيضاً

Given a graph $G$, a source node $s$ and a target node $t$, the personalized PageRank (PPR) of $t$ with respect to $s$ is the probability that a random walk starting from $s$ terminates at $t$. An important variant of the PPR query is single-source P PR (SSPPR), which enumerates all nodes in $G$, and returns the top-$k$ nodes with the highest PPR values with respect to a given source $s$. PPR in general and SSPPR in particular have important applications in web search and social networks, e.g., in Twitters Who-To-Follow recommendation service. However, PPR computation is known to be expensive on large graphs, and resistant to indexing. Consequently, previous solutions either use heuristics, which do not guarantee result quality, or rely on the strong computing power of modern data centers, which is costly. Motivated by this, we propose effective index-free and index-based algorithms for approximate PPR processing, with rigorous guarantees on result quality. We first present FORA, an approximate SSPPR solution that combines two existing methods Forward Push (which is fast but does not guarantee quality) and Monte Carlo Random Walk (accurate but slow) in a simple and yet non-trivial way, leading to both high accuracy and efficiency. Further, FORA includes a simple and effective indexing scheme, as well as a module for top-$k$ selection with high pruning power. Extensive experiments demonstrate that the proposed solutions are orders of magnitude more efficient than their respective competitors. Notably, on a billion-edge Twitter dataset, FORA answers a top-500 approximate SSPPR query within 1 second, using a single commodity server.
While PageRank has been extensively used to rank sport tournament participants (teams or individuals), its superiority over simpler ranking methods has been never clearly demonstrated. We use sports results from 18 major leagues to calibrate a state- of-art model for synthetic sports results. Model data are then used to assess the ranking performance of PageRank in a controlled setting. We find that PageRank outperforms the benchmark ranking by the number of wins only when a small fraction of all games have been played. Increased randomness in the data, such as intrinsic randomness of outcomes or advantage of home teams, further reduces the range of PageRanks superiority. We propose a new PageRank variant which outperforms PageRank in all evaluated settings, yet shares its sensitivity to increased randomness in the data. Our main findings are confirmed by evaluating the ranking algorithms on real data. Our work demonstrates the danger of using novel metrics and algorithms without considering their limits of applicability.
We provide the first information theoretic tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities. Our work bridges recent theoretical b reakthroughs in the detection of latent community structure without nodes covariates and a large body of empirical work using diverse heuristics for combining node covariates with graphs for inference. The tightness of our analysis implies in particular, the information theoretical necessity of combining the different sources of information. Our analysis holds for networks of large degrees as well as for a Gaussian version of the model.
In federated learning, models are learned from users data that are held private in their edge devices, by aggregating them in the service providers cloud to obtain a global model. Such global model is of great commercial value in, e.g., improving the customers experience. In this paper we focus on two possible areas of improvement of the state of the art. First, we take the difference between user habits into account and propose a quadratic penalty-based formulation, for efficient learning of the global model that allows to personalize local models. Second, we address the latency issue associated with the heterogeneous training time on edge devices, by exploiting a hierarchical structure modeling communication not only between the cloud and edge devices, but also within the cloud. Specifically, we devise a tailored block coordinate descent-based computation scheme, accompanied with communication protocols for both the synchronous and asynchronous cloud settings. We characterize the theoretical convergence rate of the algorithm, and provide a variant that performs empirically better. We also prove that the asynchronous protocol, inspired by multi-agent consensus technique, has the potential for large gains in latency compared to a synchronous setting when the edge-device updates are intermittent. Finally, experimental results are provided that corroborate not only the theory, but also show that the system leads to faster convergence for personalized models on the edge devices, compared to the state of the art.
We study the evolution of cooperation in populations where individuals play prisoners dilemma on a network. Every node of the network corresponds on an individual choosing whether to cooperate or defect in a repeated game. The players revise their ac tions by imitating those neighbors who have higher payoffs. We show that when the interactions take place on graphs with large girth, cooperation is more likely to emerge. On the flip side, in graphs with many cycles of length 3 and 4, defection spreads more rapidly. One of the key ideas of our analysis is that our dynamics can be seen as a perturbation of the voter model. We write the transition kernel of the corresponding Markov chain in terms of the pairwise correlations in the voter model. We analyze the pairwise correlation and show that in graphs with relatively large girth, cooperators cluster and help each other.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا