No Arabic abstract
The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.
The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in todays social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media ecosystems, but their abundance is uncertain. While many bots are benign, one can design harmful bots with the goals of persuading, smearing, or deceiving. Here we discuss the characteristics of modern, sophisticated social bots, and how their presence can endanger online ecosystems and our society. We then review current efforts to detect social bots on Twitter. Features related to content, network, sentiment, and temporal patterns of activity are imitated by bots but at the same time can help discriminate synthetic behaviors from human ones, yielding signatures of engineered social tampering.
Social media platforms attempting to curb abuse and misinformation have been accused of political bias. We deploy neutral social bots who start following different news sources on Twitter, and track them to probe distinct biases emerging from platform mechanisms versus user interactions. We find no strong or consistent evidence of political bias in the news feed. Despite this, the news and information to which U.S. Twitter users are exposed depend strongly on the political leaning of their early connections. The interactions of conservative accounts are skewed toward the right, whereas liberal accounts are exposed to moderate content shifting their experience toward the political center. Partisan accounts, especially conservative ones, tend to receive more followers and follow more automated accounts. Conservative accounts also find themselves in denser communities and are exposed to more low-credibility content.
In this paper, we consider a dataset comprising press releases about health research from different universities in the UK along with a corresponding set of news articles. First, we do an exploratory analysis to understand how the basic information published in the scientific journals get exaggerated as they are reported in these press releases or news articles. This initial analysis shows that some news agencies exaggerate almost 60% of the articles they publish in the health domain; more than 50% of the press releases from certain universities are exaggerated; articles in topics like lifestyle and childhood are heavily exaggerated. Motivated by the above observation we set the central objective of this paper to investigate how exaggerated news spreads over an online social network like Twitter. The LIWC analysis points to a remarkable observation these late tweets are essentially laden in words from opinion and realize categories which indicates that, given sufficient time, the wisdom of the crowd is actually able to tell apart the exaggerated news. As a second step we study the characteristics of the users who never or rarely post exaggerated news content and compare them with those who post exaggerated news content more frequently. We observe that the latter class of users have less retweets or mentions per tweet, have significantly more number of followers, use more slang words, less hyperbolic words and less word contractions. We also observe that the LIWC categories like bio, health, body and negative emotion are more pronounced in the tweets posted by the users in the latter class. As a final step we use these observations as features and automatically classify the two groups achieving an F1 score of 0.83.
The information collected by mobile phone operators can be considered as the most detailed information on human mobility across a large part of the population. The study of the dynamics of human mobility using the collected geolocations of users, and applying it to predict future users locations, has been an active field of research in recent years. In this work, we study the extent to which social phenomena are reflected in mobile phone data, focusing in particular in the cases of urban commute and major sports events. We illustrate how these events are reflected in the data, and show how information about the events can be used to improve predictability in a simple model for a mobile phone users location.
Social media sites are information marketplaces, where users produce and consume a wide variety of information and ideas. In these sites, users typically choose their information sources, which in turn determine what specific information they receive, how much information they receive and how quickly this information is shown to them. In this context, a natural question that arises is how efficient are social media users at selecting their information sources. In this work, we propose a computational framework to quantify users efficiency at selecting information sources. Our framework is based on the assumption that the goal of users is to acquire a set of unique pieces of information. To quantify users efficiency, we ask if the user could have acquired the same pieces of information from another set of sources more efficiently. We define three different notions of efficiency -- link, in-flow, and delay -- corresponding to the number of sources the user follows, the amount of (redundant) information she acquires and the delay with which she receives the information. Our definitions of efficiency are general and applicable to any social media system with an underlying information network, in which every user follows others to receive the information they produce. In our experiments, we measure the efficiency of Twitter users at acquiring different types of information. We find that Twitter users exhibit sub-optimal efficiency across the three notions of efficiency, although they tend to be more efficient at acquiring non-popular than popular pieces of information. We then show that this lack of efficiency is a consequence of the triadic closure mechanism by which users typically discover and follow other users in social media. Finally, we develop a heuristic algorithm that enables users to be significantly more efficient at acquiring the same unique pieces of information.