No Arabic abstract
Finding influential users in online social networks is an important problem with many possible useful applications. HITS and other link analysis methods, in particular, have been often used to identify hub and authority users in web graphs and online social networks. These works, however, have not considered topical aspect of links in their analysis. A straightforward approach to overcome this limitation is to first apply topic models to learn the user topics before applying the HITS algorithm. In this paper, we instead propose a novel topic model known as Hub and Authority Topic (HAT) model to combine the two process so as to jointly learn the hub, authority and topical interests. We evaluate HAT against several existing state-of-the-art methods in two aspects: (i) modeling of topics, and (ii) link recommendation. We conduct experiments on two real-world datasets from Twitter and Instagram. Our experiment results show that HAT is comparable to state-of-the-art topic models in learning topics and it outperforms the state-of-the-art in link recommendation task.
Peoples personal social networks are big and cluttered, and currently there is no good way to automatically organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. circles on Google+, and lists on Facebook and Twitter), however they are laborious to construct and must be updated whenever a users network grows. In this paper, we study the novel task of automatically identifying users social circles. We pose this task as a multi-membership node clustering problem on a users ego-network, a network of connections between her friends. We develop a model for detecting circles that combines network structure as well as user profile information. For each circle we learn its members and the circle-specific user profile similarity metric. Modeling node membership to multiple circles allows us to detect overlapping as well as hierarchically nested circles. Experiments show that our model accurately identifies circles on a diverse set of data from Facebook, Google+, and Twitter, for all of which we obtain hand-labeled ground-truth.
A number of recent studies of information diffusion in social media, both empirical and theoretical, have been inspired by viral propagation models derived from epidemiology. These studies model the propagation of memes, i.e., pieces of information, between users in a social network similarly to the way diseases spread in human society. Importantly, one would expect a meme to spread in a social network amongst the people who are interested in the topic of that meme. Yet, the importance of topicality for information diffusion has been less explored in the literature. Here, we study empirical data about two different types of memes (hashtags and URLs) spreading through the Twitters online social network. For every meme, we infer its topics and for every user, we infer her topical interests. To analyze the impact of such topics on the propagation of memes, we introduce a novel theoretical framework of information diffusion. Our analysis identifies two distinct mechanisms, namely topical and non-topical, of information diffusion. The non-topical information diffusion resembles disease spreading as in simple contagion. In contrast, the topical information diffusion happens between users who are topically aligned with the information and has characteristics of complex contagion. Non-topical memes spread broadly among all users and end up being relatively popular. Topical memes spread narrowly among users who have interests topically aligned with them and are diffused more readily after multiple exposures. Our results show that the topicality of memes and users interests are essential for understanding and predicting information diffusion.
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden community structure. We present a novel approach called HICODE (HIdden COmmunity DEtection) that identifies the hidden community structure as well as the dominant community structure. By weakening the strength of the dominant structure, one can uncover the hidden structure beneath. Likewise, by reducing the strength of the hidden structure, one can more accurately identify the dominant structure. In this way, HICODE tackles both tasks simultaneously. Extensive experiments on real-world networks demonstrate that HICODE outperforms several state-of-the-art community detection methods in uncovering both the dominant and the hidden structure. In the Facebook university social networks, we find multiple non-redundant sets of communities that are strongly associated with residential hall, year of registration or career position of the faculties or students, while the state-of-the-art algorithms mainly locate the dominant ground truth category. In the Due to the difficulty of labeling all ground truth communities in real-world datasets, HICODE provides a promising approach to pinpoint the existing latent communities and uncover communities for which there is no ground truth. Finding this unknown structure is an extremely important community detection problem.
Activity maximization is a task of seeking a small subset of users in a given social network that makes the expected total activity benefit maximized. This is a generalization of many real applications. In this paper, we extend activity maximization problem to that under the general marketing strategy $vec{x}$, which is a $d$-dimensional vector from a lattice space and has probability $h_u(vec{x})$ to activate a node $u$ as a seed. Based on that, we propose the continuous activity maximization (CAM) problem, where the domain is continuous and the seed set we select conforms to a certain probability distribution. It is a new topic to study the problem about information diffusion under the lattice constraint, thus, we address the problem systematically here. First, we analyze the hardness of CAM and how to compute the objective function of CAM accurately and effectively. We prove this objective function is monotone, but not DR-submodular and not DR-supermodular. Then, we develop a monotone and DR-submodular lower bound and upper bound of CAM, and apply sampling techniques to design three unbiased estimators for CAM, its lower bound and upper bound. Next, adapted from IMM algorithm and sandwich approximation framework, we obtain a data-dependent approximation ratio. This process can be considered as a general method to solve those maximization problem on lattice but not DR-submodular. Last, we conduct experiments on three real-world datasets to evaluate the correctness and effectiveness of our proposed algorithms.
Instant quality feedback in the form of online peer ratings is a prominent feature of modern massive online social networks (MOSNs). It allows network members to indicate their appreciation of a post, comment, photograph, etc. Some MOSNs support both positive and negative (signed) ratings. In this study, we rated 11 thousand MOSN member profiles and collected user responses to the ratings. MOSN users are very sensitive to peer ratings: 33% of the subjects visited the researchers profile in response to rating, 21% also rated the researchers profile picture, and 5% left a text comment. The grades left by the subjects are highly polarized: out of the six available grades, the most negative and the most positive are also the most popular. The grades fall into three almost equally sized categories: reciprocal, generous, and stingy. We proposed quantitative measures for generosity, reciprocity, and benevolence, and analyzed them with respect to the subjects demographics.