No Arabic abstract
We analyze a large-scale snapshot of del.icio.us and investigate how the number of different tags in the system grows as a function of a suitably defined notion of time. We study the temporal evolution of the global vocabulary size, i.e. the number of distinct tags in the entire system, as well as the evolution of local vocabularies, that is the growth of the number of distinct tags used in the context of a given resource or user. In both cases, we find power-law behaviors with exponents smaller than one. Surprisingly, the observed growth behaviors are remarkably regular throughout the entire history of the system and across very different resources being bookmarked. Similar sub-linear laws of growth have been observed in written text, and this qualitative universality calls for an explanation and points in the direction of non-trivial cognitive processes in the complex interaction patterns characterizing collaborative tagging.
A distributed classification paradigm known as collaborative tagging has been widely adopted in new Web applications designed to manage and share online resources. Users of these applications organize resources (Web pages, digital photographs, academic papers) by associating with them freely chosen text labels, or tags. Here we leverage the social aspects of collaborative tagging and introduce a notion of resource distance based on the collective tagging activity of users. We collect data from a popular system and perform experiments showing that our definition of distance can be used to build a weighted network of resources with a detectable community structure. We show that this community structure clearly exposes the semantic relations among resources. The communities of resources that we observe are a genuinely emergent feature, resulting from the uncoordinated activity of a large number of users, and their detection paves the way for mapping emergent semantics in social tagging systems.
Personalization collaborative filtering recommender systems (CFRSs) are the crucial components of popular e-commerce services. In practice, CFRSs are also particularly vulnerable to shilling attacks or profile injection attacks due to their openness. The attackers can carefully inject chosen attack profiles into CFRSs in order to bias the recommendation results to their benefits. To reduce this risk, various detection techniques have been proposed to detect such attacks, which use diverse features extracted from user profiles. However, relying on limited features to improve the detection performance is difficult seemingly, since the existing features can not fully characterize the attack profiles and genuine profiles. In this paper, we propose a novel detection method to make recommender systems resistant to the shilling attacks or profile injection attacks. The existing features can be briefly summarized as two aspects including rating behavior based and item distribution based. We firstly formulate the problem as finding a mapping model between rating behavior and item distribution by exploiting the least-squares approximate solution. Based on the trained model, we design a detector by employing a regressor to detect such attacks. Extensive experiments on both the MovieLens-100K and MovieLens-ml-latest-small datasets examine the effectiveness of our proposed detection method. Experimental results were included to validate the outperformance of our approach in comparison with benchmarked method including KNN.
We present collaborative similarity embedding (CSE), a unified framework that exploits comprehensive collaborative relations available in a user-item bipartite graph for representation learning and recommendation. In the proposed framework, we differentiate two types of proximity relations: direct proximity and k-th order neighborhood proximity. While learning from the former exploits direct user-item associations observable from the graph, learning from the latter makes use of implicit associations such as user-user similarities and item-item similarities, which can provide valuable information especially when the graph is sparse. Moreover, for improving scalability and flexibility, we propose a sampling technique that is specifically designed to capture the two types of proximity relations. Extensive experiments on eight benchmark datasets show that CSE yields significantly better performance than state-of-the-art recommendation methods.
Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplines like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system del.icio.us. To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose different characteristics of the selected measures of relatedness, making them applicable to different subtasks of knowledge extraction such as synonym detection or discovery of concept hierarchies.
A folksonomy is ostensibly an information structure built up by the wisdom of the crowd, but is the crowd really doing the work? Tagging is in fact a sharply skewed process in which a small minority of supertagger users generate an overwhelming majority of the annotations. Using data from three large-scale social tagging platforms, we explore (a) how to best quantify the imbalance in tagging behavior and formally define a supertagger, (b) how supertaggers differ from other users in their tagging patterns, and (c) if effects of motivation and expertise inform our understanding of what makes a supertagger. Our results indicate that such prolific users not only tag more than their counterparts, but in quantifiably different ways. Specifically, we find that supertaggers are more likely to label content in the long tail of less popular items, that they show differences in patterns of content tagged and terms utilized, and are measurably different with respect to tagging expertise and motivation. These findings suggest we should question the extent to which folksonomies achieve crowdsourced classification via the wisdom of the crowd, especially for broad folksonomies like Last.fm as opposed to narrow folksonomies like Flickr.