ترغب بنشر مسار تعليمي؟ اضغط هنا

Asynchronous iterative computations with Web information retrieval structures: The PageRank case

62   0   0.0 ( 0 )
 نشر من قبل Giorgos Kollias
 تاريخ النشر 2006
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

There are several ideas being used today for Web information retrieval, and specifically in Web search engines. The PageRank algorithm is one of those that introduce a content-neutral ranking function over Web pages. This ranking is applied to the set of pages returned by the Google search engine in response to posting a search query. PageRank is based in part on two simple common sense concepts: (i)A page is important if many important pages include links to it. (ii)A page containing many links has reduced impact on the importance of the pages it links to. In this paper we focus on asynchronous iterative schemes to compute PageRank over large sets of Web pages. The elimination of the synchronizing phases is expected to be advantageous on heterogeneous platforms. The motivation for a possible move to such large scale distributed platforms lies in the size of matrices representing Web structure. In orders of magnitude: $10^{10}$ pages with $10^{11}$ nonzero elements and $10^{12}$ bytes just to store a small percentage of the Web (the already crawled); distributed memory machines are necessary for such computations. The present research is part of our general objective, to explore the potential of asynchronous computational models as an underlying framework for very large scale computations over the Grid. The area of ``internet algorithmics appears to offer many occasions for computations of unprecedent dimensionality that would be good candidates for this framework.

قيم البحث

اقرأ أيضاً

This work studies a fully distributed algorithm for computing the PageRank vector, which is inspired by the Matching Pursuit and features: 1) a fully distributed implementation 2) convergence in expectation with exponential rate 3) low storage requir ement (two scalar values per page). Illustrative experiments are conducted to verify the findings.
A mediator can help non-cooperative agents obtain an equilibrium that may otherwise not be possible. We study the ability of players to obtain the same equilibrium without a mediator, using only cheap talk, that is, nonbinding pre-play communication. Previous work has considered this problem in a synchronous setting. Here we consider the effect of asynchrony on the problem, and provide upper bounds for implementing mediators. Considering asynchronous environments introduces new subtleties, including exactly what solution concept is most appropriate and determining what move is played if the cheap talk goes on forever. Different results are obtained depending on whether the move after such infinite play is under the control of the players or part of the description of the game.
We consider the problem of making distributed computations robust to noise, in particular to worst-case (adversarial) corruptions of messages. We give a general distributed interactive coding scheme which simulates any asynchronous distributed protoc ol while tolerating an optimal corruption of a $Theta(1/n)$ fraction of all messages while incurring a moderate blowup of $O(nlog^2 n)$ in the communication complexity. Our result is the first fully distributed interactive coding scheme in which the topology of the communication network is not known in advance. Prior work required either a coordinating node to be connected to all other nodes in the network or assumed a synchronous network in which all nodes already know the complete topology of the network.
In era of ever-expanding data and knowledge, we lack a centralized system that maps all the faculties to their research works. This problem has not been addressed in the past and it becomes challenging for students to connect with the right faculty o f their domain. Since we have so many colleges and faculties this lies in the category of big data problem. In this paper, we present a model which works on the distributed computing environment to tackle big data. The proposed model uses apache spark as an execution engine and hive as database. The results are visualized with the help of Tableau that is connected to Apache Hive to achieve distributed computing.
We propose FrogWild, a novel algorithm for fast approximation of high PageRank vertices, geared towards reducing network costs of running traditional PageRank algorithms. Our algorithm can be seen as a quantized version of power iteration that perfor ms multiple parallel random walks over a directed graph. One important innovation is that we introduce a modification to the GraphLab framework that only partially synchronizes mirror vertices. This partial synchronization vastly reduces the network traffic generated by traditional PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank. On the other hand, this partial synchronization also creates dependencies between the random walks used to estimate PageRank. Our main theoretical innovation is the analysis of the correlations introduced by this partial synchronization process and a bound establishing that our approximation is close to the true PageRank vector. We implement our algorithm in GraphLab and compare it against the default PageRank implementation. We show that our algorithm is very fast, performing each iteration in less than one second on the Twitter graph and can be up to 7x faster compared to the standard GraphLab PageRank implementation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا