No Arabic abstract
In the digital era, individuals are increasingly profiled and grouped based on the traces they leave behind in online social networks such as Twitter and Facebook. In this paper, we develop and evaluate a novel text analysis approach for studying user identity and social roles by redefining identity as a sequence of timestamped items (e.g. tweet texts). We operationalise this idea by developing a novel text distance metric, the time-sensitive semantic edit distance (t-SED), which accounts for the temporal context across multiple traces. To evaluate this method we undertake a case study of Russian online-troll activity within US political discourse. The novel metric allows us to classify the social roles of trolls based on their traces, in this case tweets, into one of the predefined categories left-leaning, right-leaning, and news feed. We show the effectiveness of the t-SED metric to measure the similarities between tweets while accounting for the temporal context, and we use novel data visualisation techniques and qualitative analysis to uncover new empirical insights into Russian troll activity that have not been identified in previous work. Additionally, we highlight a connection with the field of Actor-Network Theory and the related hypotheses of Gabriel Tarde, and we discuss how social sequence analysis using t-SED may provide new avenues for tackling a longstanding problem in social theory: how to analyse society without separating reality into micro versus macro levels.
The role of social media in promoting media pluralism was initially viewed as wholly positive. However, some governments are allegedly manipulating social media by hiring online commentators (also known as trolls) to spread propaganda and disinformation. In particular, an alleged system of professional trolls operating both domestically and internationally exists in Russia. In 2018, Twitter released data on accounts identified as Russian trolls, starting a wave of research. However, while foreign-targeted English language operations of these trolls have received significant attention, no research has analyzed their Russian language domestic and regional-targeted activities. We address this gap by characterizing the Russian-language operations of Russian trolls. We first perform a descriptive analysis, and then focus in on the trolls operation related to the crash of Malaysia Airlines flight MH17. Among other things, we find that Russian-language trolls have run 163 hashtag campaigns (where hashtag use grows abruptly within a month). The main political sentiments of such campaigns were praising Russia and Putin (29%), criticizing Ukraine (26%), and criticizing the United States and Obama (9%). Further, trolls actively reshared information with 76% of tweets being retweets or containing a URL. Additionally, we observe periodic temporal patterns of tweeting suggesting that trolls use automation tools. Further, we find that trolls information campaign on the MH17 crash was the largest in terms of tweet count. However, around 68% of tweets posted with MH17 hashtags were likely used simply for hashtag amplification. With these tweets excluded, about 49% of the tweets suggested to varying levels that Ukraine was responsible for the crash, and only 13% contained disinformation and propaganda presented as news. Interestingly, trolls promoted inconsistent alternative theories for the crash.
On 6 January 2021, a mob of right-wing conservatives stormed the USA Capitol Hill interrupting the session of congress certifying 2020 Presidential election results. Immediately after the start of the event, posts related to the riots started to trend on social media. A social media platform which stood out was a free speech endorsing social media platform Parler; it is being claimed as the platform on which the riots were planned and talked about. Our report presents a contrast between the trending content on Parler and Twitter around the time of riots. We collected data from both platforms based on the trending hashtags and draw comparisons based on what are the topics being talked about, who are the people active on the platforms and how organic is the content generated on the two platforms. While the content trending on Twitter had strong resentments towards the event and called for action against rioters and inciters, Parler content had a strong conservative narrative echoing the ideas of voter fraud similar to the attacking mob. We also find a disproportionately high manipulation of traffic on Parler when compared to Twitter.
Most current approaches to characterize and detect hate speech focus on textit{content} posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the problem, such as considering only tweets containing hate-related words. In this work we partially address these issues by shifting the focus towards textit{users}. We develop and employ a robust methodology to collect and annotate hateful users which does not depend directly on lexicon and where the users are annotated given their entire profile. This results in a sample of Twitters retweet graph containing $100,386$ users, out of which $4,972$ were annotated. We also collect the users who were banned in the three months that followed the data collection. We show that hateful users differ from normal ones in terms of their activity patterns, word usage and as well as network structure. We obtain similar results comparing the neighbors of hateful vs. neighbors of normal users and also suspended users vs. active users, increasing the robustness of our analysis. We observe that hateful users are densely connected, and thus formulate the hate speech detection problem as a task of semi-supervised learning over a graph, exploiting the network of connections on Twitter. We find that a node embedding algorithm, which exploits the graph structure, outperforms content-based approaches for the detection of both hateful ($95%$ AUC vs $88%$ AUC) and suspended users ($93%$ AUC vs $88%$ AUC). Altogether, we present a user-centric view of hate speech, paving the way for better detection and understanding of this relevant and challenging issue.
Unprecedented human mobility has driven the rapid urbanization around the world. In China, the fraction of population dwelling in cities increased from 17.9% to 52.6% between 1978 and 2012. Such large-scale migration poses challenges for policymakers and important questions for researchers. To investigate the process of migrant integration, we employ a one-month complete dataset of telecommunication metadata in Shanghai with 54 million users and 698 million call logs. We find systematic differences between locals and migrants in their mobile communication networks and geographical locations. For instance, migrants have more diverse contacts and move around the city with a larger radius than locals after they settle down. By distinguishing new migrants (who recently moved to Shanghai) from settled migrants (who have been in Shanghai for a while), we demonstrate the integration process of new migrants in their first three weeks. Moreover, we formulate classification problems to predict whether a person is a migrant. Our classifier is able to achieve an F1-score of 0.82 when distinguishing settled migrants from locals, but it remains challenging to identify new migrants because of class imbalance. This classification setup holds promise for identifying new migrants who will successfully integrate into locals (new migrants that misclassified as locals).
An infodemic is an emerging phenomenon caused by an overabundance of information online. This proliferation of information makes it difficult for the public to distinguish trustworthy news and credible information from untrustworthy sites and non-credible sources. The perils of an infodemic debuted with the outbreak of the COVID-19 pandemic and bots (i.e., automated accounts controlled by a set of algorithms) that are suspected of spreading the infodemic. Although previous research has revealed that bots played a central role in spreading misinformation during major political events, how bots behaved during the infodemic is unclear. In this paper, we examined the roles of bots in the case of the COVID-19 infodemic and the diffusion of non-credible information such as 5G and Bill Gates conspiracy theories and content related to Trump and WHO by analyzing retweet networks and retweeted items. We show the segregated topology of their retweet networks, which indicates that right-wing self-media accounts and conspiracy theorists may lead to this opinion cleavage, while malicious bots might favor amplification of the diffusion of non-credible information. Although the basic influence of information diffusion could be larger in human users than bots, the effects of bots are non-negligible under an infodemic situation.