ترغب بنشر مسار تعليمي؟ اضغط هنا

Solving the accuracy-diversity dilemma via directed random walks

115   0   0.0 ( 0 )
 نشر من قبل Jianguo Liu
 تاريخ النشر 2012
والبحث باللغة English




اسأل ChatGPT حول البحث

Random walks have been successfully used to measure user or object similarities in collaborative filtering (CF) recommender systems, which is of high accuracy but low diversity. A key challenge of CF system is that the reliably accurate results are obtained with the help of peers recommendation, but the most useful individual recommendations are hard to be found among diverse niche objects. In this paper we investigate the direction effect of the random walk on user similarity measurements and find that the user similarity, calculated by directed random walks, is reverse to the initial nodes degree. Since the ratio of small-degree users to large-degree users is very large in real data sets, the large-degree users selections are recommended extensively by traditional CF algorithms. By tuning the user similarity direction from neighbors to the target user, we introduce a new algorithm specifically to address the challenge of diversity of CF and show how it can be used to solve the accuracy-diversity dilemma. Without relying on any context-specific information, we are able to obtain accurate and diverse recommendations, which outperforms the state-of-the-art CF methods. This work suggests that the random walk direction is an important factor to improve the personalized recommendation performance.



قيم البحث

اقرأ أيضاً

Recommender systems use data on past user preferences to predict possible future likes and interests. A key challenge is that while the most useful individual recommendations are to be found among diverse niche objects, the most reliably accurate res ults are obtained by methods that recommend objects based on user or object similarity. In this paper we introduce a new algorithm specifically to address the challenge of diversity and show how it can be used to resolve this apparent dilemma when combined in an elegant hybrid with an accuracy-focused algorithm. By tuning the hybrid appropriately we are able to obtain, without relying on any semantic or context-specific information, simultaneous gains in both accuracy and diversity of recommendations.
Uncovering factors underlying the network formation is a long-standing challenge for data mining and network analysis. In particular, the microscopic organizing principles of directed networks are less understood than those of undirected networks. Th is article proposes a hypothesis named potential theory, which assumes that every directed link corresponds to a decrease of a unit potential and subgraphs with definable potential values for all nodes are preferred. Combining the potential theory with the clustering and homophily mechanisms, it is deduced that the Bi-fan structure consisting of 4 nodes and 4 directed links is the most favored local structure in directed networks. Our hypothesis receives strongly positive supports from extensive experiments on 15 directed networks drawn from disparate fields, as indicated by the most accurate and robust performance of Bi-fan predictor within the link prediction framework. In summary, our main contribution is twofold: (i) We propose a new mechanism for the local organization of directed networks; (ii) We design the corresponding link prediction algorithm, which can not only testify our hypothesis, but also find out direct applications in missing link prediction and friendship recommendation.
The usual development of the continuous time random walk (CTRW) assumes that jumps and time intervals are a two-dimensional set of independent and identically distributed random variables. In this paper we address the theoretical setting of non-indep endent CTRWs where consecutive jumps and/or time intervals are correlated. An exact solution to the problem is obtained for the special but relevant case in which the correlation solely depends on the signs of consecutive jumps. Even in this simple case some interesting features arise such as transitions from unimodal to bimodal distributions due to correlation. We also develop the necessary analytical techniques and approximations to handle more general situations that can appear in practice.
Influence maximization, defined as a problem of finding a set of seed nodes to trigger a maximized spread of influence, is crucial to viral marketing on social networks. For practical viral marketing on large scale social networks, it is required tha t influence maximization algorithms should have both guaranteed accuracy and high scalability. However, existing algorithms suffer a scalability-accuracy dilemma: conventional greedy algorithms guarantee the accuracy with expensive computation, while the scalable heuristic algorithms suffer from unstable accuracy. In this paper, we focus on solving this scalability-accuracy dilemma. We point out that the essential reason of the dilemma is the surprising fact that the submodularity, a key requirement of the objective function for a greedy algorithm to approximate the optimum, is not guaranteed in all conventional greedy algorithms in the literature of influence maximization. Therefore a greedy algorithm has to afford a huge number of Monte Carlo simulations to reduce the pain caused by unguaranteed submodularity. Motivated by this critical finding, we propose a static greedy algorithm, named StaticGreedy, to strictly guarantee the submodularity of influence spread function during the seed selection process. The proposed algorithm makes the computational expense dramatically reduced by two orders of magnitude without loss of accuracy. Moreover, we propose a dynamical update strategy which can speed up the StaticGreedy algorithm by 2-7 times on large scale social networks.
Heat conduction process has recently found its application in personalized recommendation [T. Zhou emph{et al.}, PNAS 107, 4511 (2010)], which is of high diversity but low accuracy. By decreasing the temperatures of small-degree objects, we present a n improved algorithm, called biased heat conduction (BHC), which could simultaneously enhance the accuracy and diversity. Extensive experimental analyses demonstrate that the accuracy on MovieLens, Netflix and Delicious datasets could be improved by 43.5%, 55.4% and 19.2% compared with the standard heat conduction algorithm, and the diversity is also increased or approximately unchanged. Further statistical analyses suggest that the present algorithm could simultaneously identify users mainstream and special tastes, resulting in better performance than the standard heat conduction algorithm. This work provides a creditable way for highly efficient information filtering.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا