No Arabic abstract
Todays social media platforms enable to spread both authentic and fake news very quickly. Some approaches have been proposed to automatically detect such fake news based on their content, but it is difficult to agree on universal criteria of authenticity (which can be bypassed by adversaries once known). Besides, it is obviously impossible to have each news item checked by a human. In this paper, we a mechanism to limit the spread of fake news which is not based on content. It can be implemented as a plugin on a social media platform. The principle is as follows: a team of fact-checkers reviews a small number of news items (the most popular ones), which enables to have an estimation of each users inclination to share fake news items. Then, using a Bayesian approach, we estimate the trustworthiness of future news items, and treat accordingly those of them that pass a certain untrustworthiness threshold. We then evaluate the effectiveness and overhead of this technique on a large Twitter graph. We show that having a few thousands users exposed to one given news item enables to reach a very precise estimation of its reliability. We thus identify more than 99% of fake news items with no false positives. The performance impact is very small: the induced overhead on the 90th percentile latency is less than 3%, and less than 8% on the throughput of user operations.
In this paper, we consider a dataset comprising press releases about health research from different universities in the UK along with a corresponding set of news articles. First, we do an exploratory analysis to understand how the basic information published in the scientific journals get exaggerated as they are reported in these press releases or news articles. This initial analysis shows that some news agencies exaggerate almost 60% of the articles they publish in the health domain; more than 50% of the press releases from certain universities are exaggerated; articles in topics like lifestyle and childhood are heavily exaggerated. Motivated by the above observation we set the central objective of this paper to investigate how exaggerated news spreads over an online social network like Twitter. The LIWC analysis points to a remarkable observation these late tweets are essentially laden in words from opinion and realize categories which indicates that, given sufficient time, the wisdom of the crowd is actually able to tell apart the exaggerated news. As a second step we study the characteristics of the users who never or rarely post exaggerated news content and compare them with those who post exaggerated news content more frequently. We observe that the latter class of users have less retweets or mentions per tweet, have significantly more number of followers, use more slang words, less hyperbolic words and less word contractions. We also observe that the LIWC categories like bio, health, body and negative emotion are more pronounced in the tweets posted by the users in the latter class. As a final step we use these observations as features and automatically classify the two groups achieving an F1 score of 0.83.
Recent years have witnessed remarkable progress towards computational fake news detection. To mitigate its negative impact, we argue that it is critical to understand what user attributes potentially cause users to share fake news. The key to this causal-inference problem is to identify confounders -- variables that cause spurious associations between treatments (e.g., user attributes) and outcome (e.g., user susceptibility). In fake news dissemination, confounders can be characterized by fake news sharing behavior that inherently relates to user attributes and online activities. Learning such user behavior is typically subject to selection bias in users who are susceptible to share news on social media. Drawing on causal inference theories, we first propose a principled approach to alleviating selection bias in fake news dissemination. We then consider the learned unbiased fake news sharing behavior as the surrogate confounder that can fully capture the causal links between user attributes and user susceptibility. We theoretically and empirically characterize the effectiveness of the proposed approach and find that it could be useful in protecting society from the perils of fake news.
Although significant effort has been applied to fact-checking, the prevalence of fake news over social media, which has profound impact on justice, public trust and our society, remains a serious problem. In this work, we focus on propagation-based fake news detection, as recent studies have demonstrated that fake news and real news spread differently online. Specifically, considering the capability of graph neural networks (GNNs) in dealing with non-Euclidean data, we use GNNs to differentiate between the propagation patterns of fake and real news on social media. In particular, we concentrate on two questions: (1) Without relying on any text information, e.g., tweet content, replies and user descriptions, how accurately can GNNs identify fake news? Machine learning models are known to be vulnerable to adversarial attacks, and avoiding the dependence on text-based features can make the model less susceptible to the manipulation of advanced fake news fabricators. (2) How to deal with new, unseen data? In other words, how does a GNN trained on a given dataset perform on a new and potentially vastly different dataset? If it achieves unsatisfactory performance, how do we solve the problem without re-training the model on the entire data from scratch? We study the above questions on two datasets with thousands of labelled news items, and our results show that: (1) GNNs can achieve comparable or superior performance without any text information to state-of-the-art methods. (2) GNNs trained on a given dataset may perform poorly on new, unseen data, and direct incremental training cannot solve the problem---this issue has not been addressed in the previous work that applies GNNs for fake news detection. In order to solve the problem, we propose a method that achieves balanced performance on both existing and new datasets, by using techniques from continual learning to train GNNs incrementally.
Basic human values represent a set of values such as security, independence, success, kindness, and pleasure, which we deem important to our lives. Each of us holds different values with different degrees of significance. Existing studies show that values of a person can be identified from their social network usage. However, the value priority of a person may change over time due to different factors such as life experiences, influence, social structure and technology. Existing studies do not conduct any analysis regarding the change of users value from the social influence, i.e., group persuasion, form the social media usage. In our research, first, we predict users value score by the influence of friends from their social media usage. We propose a Bounded Confidence Model (BCM) based value dynamics model from 275 different ego networks in Facebook that predicts how social influence may persuade a person to change their value over time. Then, to predict better, we use particle swarm optimization based hyperparameter tuning technique. We observe that these optimized hyperparameters produce accurate future value score. We also run our approach with different machine learning based methods and find support vector regression (SVR) outperforms other regressor models. By using SVR with the best hyperparameters of BCM model, we find the lowest Mean Squared Error (MSE) score 0.00347.
Social media is currently one of the most important means of news communication. Since people are consuming a large fraction of their daily news through social media, most of the traditional news channels are using social media to catch the attention of users. Each news channel has its own strategies to attract more users. In this paper, we analyze how the news channels use sentiment to garner users attention in social media. We compare the sentiment of social media news posts of television, radio and print media, to show the differences in the ways these channels cover the news. We also analyze users reactions and opinion sentiment on news posts with different sentiments. We perform our experiments on a dataset extracted from Facebook Pages of five popular news channels. Our dataset contains 0.15 million news posts and 1.13 billion users reactions. The results of our experiments show that the sentiment of user opinion has a strong correlation with the sentiment of the news post and the type of information source. Our study also illustrates the differences among the social media news channels of different types of news sources.