No Arabic abstract
In this paper we take into account both social and linguistic aspects to perform demographic analysis by processing a large amount of tweets in Basque language. The study of demographic characteristics and social relationships are approached by applying machine learning and modern deep-learning Natural Language Processing (NLP) techniques, combining social sciences with automatic text processing. More specifically, our main objective is to combine demographic inference and social analysis in order to detect young Basque Twitter users and to identify the communities that arise from their relationships or shared content. This social and demographic analysis will be entirely based on the~automatically collected tweets using NLP to convert unstructured textual information into interpretable knowledge.
Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the declared set of friends and followers.
One of the new scientific ways of understanding discourse dynamics is analyzing the public data of social networks. This researchs aim is Post-structuralist Discourse Analysis (PDA) of Covid-19 phenomenon (inspired by Laclau and Mouffes Discourse Theory) by using Intelligent Data Mining for Persian Society. The examined big data is five million tweets from 160,000 users of the Persian Twitter network to compare two discourses. Besides analyzing the tweet texts individually, a social network graph database has been created based on retweets relationships. We use the VoteRank algorithm to introduce and rank people whose posts become word of mouth, provided that the total information spreading scope is maximized over the network. These users are also clustered according to their word usage pattern (the Gaussian Mixture Model is used). The constructed discourse of influential spreaders is compared to the most active users. This analysis is done based on Covid-related posts over eight episodes. Also, by relying on the statistical content analysis and polarity of tweet words, discourse analysis is done for the whole mentioned subpopulations, especially for the top individuals. The most important result of this research is that the Twitter subjects discourse construction is government-based rather than community-based. The analyzed Iranian society does not consider itself responsible for the Covid-19 wicked problem, does not believe in participation, and expects the government to solve all problems. The most active and most influential users similarity is that political, national, and critical discourse construction is the predominant one. In addition to the advantages of its research methodology, it is necessary to pay attention to the studys limitations. Suggestion for future encounters of Iranian society with similar crises is given.
The role of social media in promoting media pluralism was initially viewed as wholly positive. However, some governments are allegedly manipulating social media by hiring online commentators (also known as trolls) to spread propaganda and disinformation. In particular, an alleged system of professional trolls operating both domestically and internationally exists in Russia. In 2018, Twitter released data on accounts identified as Russian trolls, starting a wave of research. However, while foreign-targeted English language operations of these trolls have received significant attention, no research has analyzed their Russian language domestic and regional-targeted activities. We address this gap by characterizing the Russian-language operations of Russian trolls. We first perform a descriptive analysis, and then focus in on the trolls operation related to the crash of Malaysia Airlines flight MH17. Among other things, we find that Russian-language trolls have run 163 hashtag campaigns (where hashtag use grows abruptly within a month). The main political sentiments of such campaigns were praising Russia and Putin (29%), criticizing Ukraine (26%), and criticizing the United States and Obama (9%). Further, trolls actively reshared information with 76% of tweets being retweets or containing a URL. Additionally, we observe periodic temporal patterns of tweeting suggesting that trolls use automation tools. Further, we find that trolls information campaign on the MH17 crash was the largest in terms of tweet count. However, around 68% of tweets posted with MH17 hashtags were likely used simply for hashtag amplification. With these tweets excluded, about 49% of the tweets suggested to varying levels that Ukraine was responsible for the crash, and only 13% contained disinformation and propaganda presented as news. Interestingly, trolls promoted inconsistent alternative theories for the crash.
The history of journalism and news diffusion is tightly coupled with the effort to dispel hoaxes, misinformation, propaganda, unverified rumours, poor reporting, and messages containing hate and divisions. With the explosive growth of online social media and billions of individuals engaged with consuming, creating, and sharing news, this ancient problem has surfaced with a renewed intensity threatening our democracies, public health, and news outlets credibility. This has triggered many researchers to develop new methods for studying, understanding, detecting, and preventing fake-news diffusion; as a consequence, thousands of scientific papers have been published in a relatively short period, making researchers of different disciplines to struggle in search of open problems and most relevant trends. The aim of this survey is threefold: first, we want to provide the researchers interested in this multidisciplinary and challenging area with a network-based analysis of the existing literature to assist them with a visual exploration of papers that can be of interest; second, we present a selection of the main results achieved so far adopting the network as an unifying framework to represent and make sense of data, to model diffusion processes, and to evaluate different debunking strategies. Finally, we present an outline of the most relevant research trends focusing on the moving target of fake-news, bots, and trolls identification by means of data mining and text technologies; despite scholars working on computational linguistics and networks traditionally belong to different scientific communities, we expect that forthcoming computational approaches to prevent fake news from polluting the social media must be developed using hybrid and up-to-date methodologies.
Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.