Characterizing the spread of exaggerated news content over social media


الملخص بالإنكليزية

In this paper, we consider a dataset comprising press releases about health research from different universities in the UK along with a corresponding set of news articles. First, we do an exploratory analysis to understand how the basic information published in the scientific journals get exaggerated as they are reported in these press releases or news articles. This initial analysis shows that some news agencies exaggerate almost 60% of the articles they publish in the health domain; more than 50% of the press releases from certain universities are exaggerated; articles in topics like lifestyle and childhood are heavily exaggerated. Motivated by the above observation we set the central objective of this paper to investigate how exaggerated news spreads over an online social network like Twitter. The LIWC analysis points to a remarkable observation these late tweets are essentially laden in words from opinion and realize categories which indicates that, given sufficient time, the wisdom of the crowd is actually able to tell apart the exaggerated news. As a second step we study the characteristics of the users who never or rarely post exaggerated news content and compare them with those who post exaggerated news content more frequently. We observe that the latter class of users have less retweets or mentions per tweet, have significantly more number of followers, use more slang words, less hyperbolic words and less word contractions. We also observe that the LIWC categories like bio, health, body and negative emotion are more pronounced in the tweets posted by the users in the latter class. As a final step we use these observations as features and automatically classify the two groups achieving an F1 score of 0.83.

تحميل البحث